Data Management

Snowflake Architecture and Key Features

July 25, 2022

min read

Snowflake is one of the fastest growing cloud data warehouse. Snowflake Architecture is designed to exploit the strengths of public cloud.

60-Second Summary

Snowflake is a cloud-based data warehouse created in 2012 by three data warehousing experts who were formerly at Oracle Corporation. Over the last eight years, Snowflake Computing, the vendor behind the Snowflake Cloud Data Warehouse product raised over $400 million and has acquired thousands of customers. One might wonder if there is a need for another data warehouse vendor in an already crowded field comprising traditional data warehousing technologies like Oracle, Teradata, SQL Server, and cloud data warehouses like Amazon Redshift and Google BigQuery. Well, the answer lies in the disruption caused by cloud technologies, and the opportunities cloud has afforded for new technology companies. Public clouds have enabled startups to shed past baggage, learn from the past, challenge the status quo, and take a fresh look at opportunities provided by the cloud to create a novel data warehouse product. In this article, we attempt to introduce you to Snowflake and touch upon the core technology components that make up this modern data warehouse built entirely in the cloud and for consumers of cloud technologies.

You can register for a free trial of Snowflake within minutes. This credit is enough to store a terabyte of data and run a small data warehouse environment for a few days.

What is Snowflake Architecture

Before we jump into the architecture of Snowflake, it is worthwhile to discuss the concepts of clustering and the popular clustering techniques.

Clustering Architectures: Shared Nothing and Shared Disk

The demand for applications to be online and available at all times is increasing daily. However, meeting these expectations puts a substantial operational burden on the underlying computing infrastructure. Loss of functionality, an under-performing technology stack, and non-availability of systems become a death knell for many businesses that have revenue models tied to the constant availability and performance of their technology stack. Downtimes are caused due to planned reasons like patching or upgrading or due to unplanned reasons like hardware failures or natural hazards. As companies increasingly become global organizations, they need systems that operate 24 X 7.

Clustering is the default go-to methodology adopted to increase the availability and performance of their hardware. Clustering, simply put, is the deployment of multiple processors or independent systems to tackle a problem faster, and more reliably than a single processor, while appearing to be a single unit to the user issuing the command. However, the devil is always in the details.

What is the Purpose of Clustering

Clustering is generally the go-to option to provide enhanced scalability and availability of the applications. Clusters improve scalability by providing options to supplement more computing power to the application infrastructure when required. Clusters improve availability as they ensure the availability of processing power despite the failure of one or more processing units.

A well-programmed cluster manager software manages these changes to the topology changes seamlessly to the end-user. Availability, usually measured in multiple 9s, is typically the primary goal of any clustering exercise. However, as mentioned earlier, clusters enable the addition of additional computing power when required to meet the demands of application processing.

What are the Different Types of Clustering

There are two predominantly used approaches to clustering. They are called Shared-disk and shared-nothing architectures.

Shared-disk Architecture

In this setup, all computing nodes share the same disk or storage device. Every computing node (processor) has its private memory; however, all processors can access all disks. Since all nodes have access to the same data, a cluster control software is required to monitor and manage the processing of data, so all nodes have a consistent copy of the data as it undergoes updates, deletes, or updates. Attempts by two (or more) nodes to concurrently update the same data must be forbidden.

Enforcement of these management criteria results in a degradation of performance and scalability of the shared disk systems. Typically, a shared-disk architecture is well-suited for large-scale processing demanding ACID compliance. Oracle Real Database Clusters is one such example of shared database architecture. A shared disk is typically feasible for applications and services requiring only limited shared data access, as well as applications or workloads that are difficult to partition. Applications that undergo frequent updates are possibly better off in a shared-nothing architecture due to the potential for a shared-disk lock management controller becoming a bottleneck.

Shared-nothing Architecture

In a shared-nothing setup, each computing node has its private memory that is not shared and its storage or disk capacity. Networking interconnects offer communication capability between these nodes. When a processing requests come in, a router routes the request to the appropriate computing node for request fulfillment. Some business rules are generally applied at this routing layer to route traffic efficiently to each node. In a shared-nothing setup, the transfer of the processing rights to another node in the cluster follows a failure of any computing node.

This transfer of ownership ensures no disruption to the processing of user requests. A shared-nothing architecture offers a high degree of availability and scalability to the application. Modern web-scale technology companies Google pioneered the implementation of these shared-nothing architectures run geographically distributed shared-nothing clusters comprising thousands of computing nodes. This is the reason why a shared-nothing clustering architecture is an ideal choice for a read-heavy analytical data processing system like a data warehouse.

Shared-Disk vs Shared-Nothing – A quick comparison

‍

Shared-Disk	Shared-Nothing
Expensive hardware with redundancy to handle component failure	Typically built on commodity hardware
High availability	Node availability is low and system availability is high
Relatively low scalability	High Scalability
Preferred in OLTP systems that require ACID compliance	Preferred in an environment with high, read/write rates
Data is partitioned and striped but within the storage array	Data may be partitioned and distributed across the cluster

‍

Back to Snowflake Architecture

Compute

Snowflake relies on the standard computing infrastructure, i.e. virtual machines available to anyone in a public cloud environment. In AWS, it is EC2, and in GCP it is the compute engine. Virtual Warehouses form a critical component in the Snowflake architecture. These virtual warehouses, by design, can process massive volumes of data with a high degree of efficiency and performance. When an incoming query is detected, computing power becomes available immediately to process the request. Similar to other database technologies, the implementation of intelligent caching ensures optimal utilization of resources and reduces the interaction between compute and storage systems. However, Snowflake deploys multiple virtual warehouses to process a request while simultaneously maintaining the integrity of the transaction, making the system ACID compliant.

Storage

Snowflake relies on scalable cloud blob storage available in public clouds like AWS, Azure, and GCP. Relying on massively distributed storage systems enables Snowflake to provide a high degree of performance, reliability, availability, capacity, and scalability required by the most demanding of data warehousing workloads. The storage layer of Snowflake is architected to support scaling of storage independent of the compute layer. This design choice works out great for the consumer both in terms of performance as well as cost. The storage layer holds the data, tables, and query results for Snowflake.

By segregating compute and storage, Snowflake can fulfill and scale read requests and write requests without having to prioritize one over the other. This segregation is one of the unique features of Snowflake made possible by its ground-up redesign of the data warehouse stack. Storage management is entirely handled by Snowflake, leaving nothing to the end-user. As data loads into Snowflake, algorithms take over to process and partition the incoming data and create metadata. This metadata enables efficient query processing down the line. Columnar compression applied to these partitions optimizes the utilization of space as well as improves query performance. The data is also encrypted to meet the highest standards of security required by enterprise companies.

Services

The services layers of Snowflake are where all the intelligent action happens. This layer performs various functions like authenticating users, management of the cluster, Query execution and optimization, security, encryption, and the orchestration of transaction execution. This layer runs on compute nodes that are stateless and span the entire data center. Intelligent use of metadata distributed across the cluster of computing nodes maintains the global state of transactions and the system.

When a query is issued, the services layer parses the query, compiles it, and determines which set of partitions hold the data of interest and flags them for scanning. One would expect the processing of the metadata to take up sizable computing power, and they wouldn’t be wrong to think so. However, by design, the processing of metadata happens on a separate cluster of machines which reduces the impact of the actual compute resources processing the data for the user.

Snowflake’s Multi-Cluster Shared-Data Architecture

Snowflake Features

Near-Zero Administration

Snowflake removes the management constraints typical of conventional data platforms. Snowflake is a cloud-native data warehousing platform. The system design offers a high degree of performance while simultaneously eliminating the need for administration overhead. The database is fully managed and scales automatically based on the demands of the workload. In-built performance tuning, infrastructure management, and optimization capabilities provide businesses with peace of mind. All they need to do is to bring their data and leave the management of it to Snowflake.

Availability

Snowflake Architecture is built out to be fully distributed and spans across multiple availability zones and regions and is highly fault-tolerant to hardware failures. Users of Snowflake rarely notice the impact of any failure in the underlying hardware.

Security

Security is one of the hallmarks of Snowflake Architecture. Data is encrypted both in transit and at rest. There are multiple authentication mechanisms that Snowflake supports, including two-factor authentication and federated authentication with support for SSO. Role-based access control and capabilities to restrict access based on pre-defined criteria. Snowflake also boasts of a host of certifications including HIPAA and SOC2 Type2. Refer to the Snowflake security documentation for more details.

Sharing and Collaboration

Snowflake offers a unique feature for data owners to share their data with partners or other consumers without needing to create a new copy of the data. The consumer of the data only pays for the processing of the data as there is no data movement involved and their storage is not utilized. Avoid the hassles involved in FTP or email by using native sharing features provided by Snowflake that you can invoke via native SQL.

Multi-Cloud

Snowflake is the only fully managed data warehouse that is available in multiple clouds while retaining the same user experience. Snowflake meets its users where they are comfortable and by doing so, reduces the need to move data back and forth from their cloud environment to Snowflake over the internet. Snowflake is available on Amazon Web Services, Google Cloud Platform, and Microsoft Azure.

Performance and Scalability

Snowflake is well known for its performance capabilities. By enabling scaling of computing and storage separately, Snowflake eliminated one of the biggest bottlenecks associated with traditional database technologies while preserving everything good about traditional RDBMS technologies. Users can start by specifying a cluster size for initial deployment and scale as they need to even while the system is up and running. Scaling operations are handled transparently to the users by Snowflake.

Pricing

Snowflake offers a simplified pricing experience to users. A real pay-per-use model supports billing on a per-second basis. Users only pay for the storage that they use, and the computing power deployed to process a request. There are no upfront costs involved or extensive planning needed to get started with your data warehousing initiative. Clusters scale to process heavy workloads and scale back down to the pre-defined size automatically. Users get billed for the expanded capacity only for the duration of use.

Where does Snowflake run

Snowflake on Amazon Web Services (AWS)

Snowflake was initially released on Amazon Web Services (AWS) and is a cloud-native data warehousing platform for loading, analyzing, and reporting large amounts of data. Conventional on-premise technologies like Oracle and Teradata are expensive for many small to mid-sized businesses. Procurement and installation of hardware, and the expense involved in the installation and maintenance of software are just a couple of reasons for this perception. Snowflake, on the other hand, is installed in the cloud and becomes available for users within minutes. Snowflake’s pricing model delivers incredible flexibility for organizations of all sizes to adopt a data warehouse as a unified data store for reporting and analytics. AWS users can spin up their Snowflake environment directly from the AWS marketplace.

Snowflake on Azure

Snowflake later launched on the Microsoft Azure cloud platform. This launch enabled the companies already on the Azure cloud the choice to go with either Azure’s SQL Datawarehouse or with Snowflake as their data warehousing technology of choice. Read this blog to get started with Snowflake on Azure. Customers who leverage Azure on Snowflake can also benefit from one of the industry-leading business intelligence products, Microsoft PowerBI. By co-locating the data warehouse and PowerBI, customers can avoid the latency that is typically involved in moving data from the data warehouse to the cloud environment that is hosting the Business Intelligence software. Follow this link to understand how easy it is to connect PowerBI to Snowflake.

Snowflake on Google Cloud

In June 2019, Google announced a strategic partnership with Snowflake to offer customers of Google Cloud Platform to leverage Snowflake as one of the choices of data warehousing technologies available in addition to Google BigQuery. Looker and Snowflake are one of the most prominent technology combinations in the market currently. With the acquisition of Looker and the availability of Snowflake, GCP customers can now benefit from the shared synergy and co-existence of these products within the same cloud environment.

Saras Analytics is an official Snowflake ETL Partner. Our product, Daton, seamlessly replicates data from various data sources into Snowflake without you having to write a single line of code. With 100+ connectors to different data sources, Daton is the fastest and easiest way to replicate data to Snowflake.

Table of content

What is Snowflake Architecture

Must read resources

February 12, 2025

How D2C Brands Can Build Resilience in ‘Trump-tariffs Era’ With Supply Chain Uncertainty?

Discover how D2C brands can navigate tariffs, supplier price hikes, and supply chain disruptions with real-time profitability insights and proactive strategies.

April 1, 2024

Ready to Stop Guessing and Start Growing?

Ready to see how Saras Pulse can transform your e-commerce marketing strategy ?

Start your free trial Talk to data consultants

DISCLAIMER: Trademarks are held by their respective owners

Snowflake Architecture and Key Features

What is Snowflake Architecture

Clustering Architectures: Shared Nothing and Shared Disk

What is the Purpose of Clustering

What are the Different Types of Clustering

Shared-disk Architecture

Shared-nothing Architecture

Shared-Disk vs Shared-Nothing – A quick comparison

Back to Snowflake Architecture

Compute

Storage

Services

Snowflake’s Multi-Cluster Shared-Data Architecture

Snowflake Features

Near-Zero Administration

Availability

Security

Sharing and Collaboration

Multi-Cloud

Performance and Scalability

Pricing

Where does Snowflake run

Snowflake on Amazon Web Services (AWS)

Snowflake on Azure

Snowflake on Google Cloud

Must read resources

How D2C Brands Can Build Resilience in ‘Trump-tariffs Era’ With Supply Chain Uncertainty?

Data Dominance in eCommerce: A CEO's Blueprint for 1000x Triumph

Best Data Analytics Company for eCommerce Brands and Agencies in Austin

eCommerce Analytics 101 | What is eCommerce Analytics

Top 75 eCommerce KPIs – Definitions & Formulas

Important eCommerce Metrics Explained- RoAS, CAC, and LTV

Amazon Business Reports 2024

The Ultimate Guide to Shopify Reports

Various Paid and Non-Paid Channels in Google Analytics

How Amazon Plans Its Customer Retention Strategy

How Some Sellers Are Getting More Out of Amazon Ads

Building a Scalable Data Warehouse and its Maintenance

Ways to Improve Data Analyst Productivity

11 Best Practices for Data Modeling

Data Scientist Or Data Analyst: Who Is The Best for Your Business?

Learn the Cross-selling Steps to Grow your Business

Learn The Art of Customer Retention Strategy with Google Analytics

How Predictive Analytics can Enhance your Marketing

Top 3 Essential Drivers for Cloud Data Warehouse Adoption

How Important Product Sequencing is to the World of Ecommerce

How to use Inventory Data Effectively to Drive Business Growth?

5 Benefits of Automated Data Ingestion

What is Right Latency for Data Analytics

How to Pitch Your Management to Adopt Data Analytics & Business Intelligence?

Top 5 Free ETL Tools for MySQL

Pros and Cons of Amazon Redshift

How Sales & Marketing Team Use Google Sheets for Data Analysis

Top 5 ETL Tools for Snowflake Data Warehouse

Data Analysis Using MS Excel

Amazon RDS Pros and Cons – A Detailed Overview

How Reporting and Analytics Can Grow your Business

Amazon Seller Central vs Amazon Vendor Central

Amazon Vendor Central Guide 2024

Amazon Sponsored Products vs Amazon Sponsored Brands

Complete Guide on Amazon Seller Central

Amazon PPC Advertising Guide

Amazon KPI Guide 2025

Amazon Buy Box Guide

Amazon Brand Registry Guide

Amazon Glossary

Amazon Brand Analytics Guide 2024

Amazon Attribution Guide 2025

Amazon Aggregators 2025

Amazon ASIN Guide

ACoS Guide (Amazon Advertising Cost of Sale)

Amazon API Guide

A Simple Guide for Customer Lifetime Value

A Practical Guide to Measuring the Lifetime value of Amazon Customers

What is Amazon SP-API?

Why Do Businesses Need Automated Data Analytics?

10 Ways To Support Data Analytics Team

Top 10 Benefits of Using ETL tools for Data Migration

A Detailed Guide to Data Mining

Product Listing Ads (PLA): A Powerful Marketing Tool to Build Your Brand