It is time for every business to own a data warehouse.
The value of having a data warehouse containing data from all the different sources i.e., applications, databases, files, etc. is well understood in the enterprise market for many decades to the point that it is a norm. These enterprises made significant investments into data warehousing technology and reaped the benefits of their investment in the form of access to information which enabled faster, smarter, and more informed decision making.
Over time, smaller enterprises and some mid-market companies after having realized the advantages of data warehousing started making investments into creating a data consolidation strategy aided by technology from companies such as Oracle, Microsoft and open source technologies such as MySQL and Postgres, etc. However, setting up a data warehouse, creating ETL jobs that enabled data movement, transformation, loading of data into the data warehouse and creating a reporting strategy required reliance of various resources skilled in database administration, data modeling, performance tuning, ETL development, reporting, and visualization development among other skills. A cursory glance at a careers site will highlight roles like database administrator, ETL developer, report developer, database developer, BI architect, BI developer all of which were required to get up and running on a data warehousing strategy. Keep in mind, we haven’t even spoken about analytics, machine learning, and AI at this point. We will save that for another day.
Now, let’s breakdown what is happening in the industry and why we believe that every company should now operate a data warehouse. Let’s look at some of the factors that lead us to believe that data warehousing as an aspirational goal has now become an achievable goal for small and mid-market companies as well, at scale.
For the longest time, data warehouses ran on-premises. They required constant maintenance just like your car and specialized resources who knew how to make the database run efficiently, keep running at all times, recover if it breaks down, keep it from getting broken into, and ensure the database doesn’t lose your data. These highly sought after resources are expensive and good ones, hard to find.
To create a database, organizations had to go through a long drawn procurement process, determine what server capacity they needed, what licenses they required, if they were buying licensed database software, use internal database experts or hire external DBAs to get the database up and running once the servers arrive, were provisioned and were ready to be utilized. Today, it requires a swipe of a credit card to create a database in the cloud.
Database administration had also become easy as cloud vendors added tools that simplified managing the database. What that meant is that small administration teams could manage more databases in the same amount of time. Over the last few years, however, further advancement in technology resulted in fully managed database services like Google’s Bigquery, Oracle’s Autonomous Data Warehouse, Snowflake, and others. These databases, to a varying degree, can run on their own, require no maintenance, ensure data protection, can secure themselves, and tune themselves to provide optimal performance at all times. So, if you had swiped the credit card a minute back when you were reading the previous paragraph, you would’ve had a database by now ready to accept incoming data. And moreover, it requires no maintenance – an autonomous car, that you do not have to take to a garage or show to a mechanic; for the most part.
There were several database management technologies in the market for many years. Each had its share of pros and cons, their value proposition, and people who had strong opinions on which ones are better at what. The good ones, traditional DB players like Oracle, Teradata, etc. were enterprise-focused, while the likes of SQL Server were more friendly to mid-market companies. For the companies who did not have deep pockets, or had applications that didn’t require enterprise features or just a team that leaned more towards open source relied on the likes of MySQL, PostgreSQL, and other such open source technologies. But there was still a cost to be paid to install the metal to run the database, excess capacity that had to be purchased before actual demand caught up, and all the professional help required to get the database up and running efficiently. Add to this the fact that a data warehouse could be sitting idle for a good part of the day, but you are still paying for it as if it was running full throttle, akin to an 8-cylinder engine that guzzles gas waiting for the light to turn green at the traffic signal. Thankfully though, we now have cars that turn themselves off or run on fewer cylinders to conserve gas under low demand situations. Cloud data warehouses also offer similar capabilities now where customers don’t have to pay for full capacity at all times but instead scale up resources when they need to and only pay for what they use.
When considered together, the innovations in database technologies and newer pricing models have made it really simple and affordable to spin up a data warehouse.
Which brings me to the next topic, i.e., how to move the data to the data warehouse?
ETL VS ELT
ETL stands for Extract, Transform, and Load, an acronym that is familiar to anyone who has worked on a data warehouse. This piece of technology is another critical component of the entire data warehousing stack. Traditional players such as Informatica, Oracle, Microsoft’s SSIS, and Talend have long been dominant players in this space. Their technology-enabled establishing connections to a variety of data sources, transform the data to fit the needs of the business, before loading the data into the data warehouse. Some of these tools allowed push down transformations where the transformations ran in the data warehouse instead of on a separate middle-tier. These products have enabled many data warehouse initiatives and still command a significant presence in the enterprise market. These products also require specialized resources like ETL developers who can operate these tools.
As the database technology evolved, so have the applications that support business operations. Today, most companies are run using best of breed SaaS applications to support various business functions. Applications such as Salesforce, Oracle, Workday, Hubspot, Google, Facebook Ads, and many others have made it easy and affordable for a small and mid-sized business to have access to the same software that a large enterprise users at an affordable cost. They have also given rise to the API economy.
This massive shift to SaaS applications over the last two decades and the rapid ascent of cloud computing has allowed looking at ETL technology with a new lens. Vendors like us have created a new breed of ELT technology that reduces the need for specialized resources to bring data to your data warehouse. We can now replicate data to your data warehouse without you having to write a single line of code or manage any infrastructure while supporting a wide variety of applications, databases, files, webhooks and other means of data ingestion in a matter of minutes. You still, however, have to transform the data, but that transformation can happen in the data warehouse using SQL, a programming language that has been around for time immemorial and is known to many resources.
Similar advancements to technology that enables reporting and visualization and low cost of licenses mean a small business with five users can get access to good dashboarding software for less than $50 a month!
When you combine all these technologies, a business can now get up and running with a data warehouse at a fraction of the cost and with significantly fewer resources than was possible only a few years ago.
We setup Saras Analytics with one goal, and that is to enable small to mid-sized companies to own their data and leverage that data to power their next growth phase. Our suite of products and services are designed, developed, and priced to enable rapid adoption of data warehousing and analytics by all companies and especially the ones that have not yet taken to this world.
For those looking for a quick summary, the video, highlighting the advancements in pit stop technology is a good analogy for data warehousing as well. The only difference here is that the pitstop, in this case, the data warehouse, is not just limited to a Ferrari, but to your neighborhood tire shop as well! So, why wait when you can get up and running with a data warehouse in minutes and start unlocking the secrets hidden in your data? If you still prefer to sit in the lobby of a tire shop and read a car magazine while your tires are getting replaced, then know that it is a choice you made when the world around is zooming past you.