Google Cloud Platform – Google BigQuery Pricing
Google Cloud Platform is the first public cloud to offer a genuine, petabyte-scale data warehouse service. It is an enterprise-grade data warehouse that was released in 2010. Over the last 9 years, the team at Google made significant improvements to the Google BigQuery service, and today it is considered as one of the most economical and fully managed data warehouses on the market.
Google BigQuery is the first genuinely serverless data warehouse as a service offering in the market. There are no servers to manage, no patches to apply, or any upgrades to be made. The role of a database administrator in a big query environment is to architected the schemas and optimize the partitions for performance and cost. This service automatically scales to fulfill the demands of any query without the need for intervention by a database administrator. This service also introduced an unusual pricing model that is based not on the storage capacity or the compute capacity needed to process your queries. Instead, the pricing relies on the amount of data processed by the incoming queries.
The best part about BigQuery is that you can load data to the service and start using it immediately. Users no longer have to worry about what runs under the hood because implementation details are hidden from them. All you need is a mechanism to load data into BigQuery and the ability to write SQL queries.
It is good practice to understand the architecture of BigQuery. Understanding the architecture helps in controlling costs, optimizing query performance, and optimizing storage. The factors that govern BigQuery are Storage and Query Data Processed. Let’s look at this in more detail.
BigQuery offers a scalable and flexible pricing model to meet your technical requirements and your budget.
Any data warehouse has storage needs. In BigQuery, storage is classified into two types.
- Active Storage – Storage used by tables or partitions that have been modified in the last 90 days.
- Long-term Storage – Storage used by tables or partitions that have not been modified in the last 90 days.
Confusing? It can be. Let us demystify this for you.
It is essential to understand this distinction because the amount you pay for BigQuery might vary significantly based on how your tables and partitions have been defined.
For example, consider a table with 10Gb of data with all the data stored in a single partition. When you insert a new row into this table, that new role gets created in the same partition rending the partition to be considered an active partition.
Assume that the table stated above has order data for the last 10 years. Daily, only data that got created in the last one month gets queried. In this case, breaking the partitions down by MON-RR of the order creation date creates a few smaller partitions. When a new record gets inserted, only the partition for the current MON-RR gets impacted. Breaking down the table using such logic would mean that only 3 or 4 smaller partitions are considered as active at any given time (remember the 90-day rule?) instead of the single large partition. As a result, the pricing for storage reduces because a majority of the partitions get billed under long-term storage, and some of them get billed under active storage.
It is important to note that the logic used for partitioning tables has to be chosen carefully after a thorough evaluation of the queries that are required to run.
Let’s dig one level deeper to understand how storage utilized is calculated by taking a simple example. Assume that you are creating a table, sample_table, with 2 columns and 10 rows.
- Column A – Integer
- An integer field uses 8 bytes of storage
- Column B – Numeric
- An integer field uses 16 bytes of storage
For 10 rows inserted into the sample_table, the total storage utilized is:
(8 Bytes * 10) + (16 Bytes * 10) = 240 bytes
You get charged for 240 bytes. At the time of writing this article, Google doesn’t charge users for storage up to 10GB. So this was the only table, you wouldn’t see a charge.
This pricing effectively means that a lot of small businesses can technically adopt a fully managed data warehouse without having to pay a hefty bill.
The table below lists the various data types that BigQuery supports and the storage utilized for each data type.
|2 bytes + the UTF-8 encoded string size|
|2 bytes + the number of bytes in the value|
|0 bytes + the size of the contained fields|
|16 bytes + 24 bytes * the number of vertices in the geography type (you can verify the number of vertices using the |
Please review the pricing summary page to get the update to data information on the storage pricing. For an accurate assessment of costs of storage, it is prudent to always rely on the official Google Cost Estimator.
BigQuery Storage API Pricing:
Customers using the BigQuery storage API get charged for on-demand usage of data processed by read_rows operations. Customers under the flat-rate model get specific amounts of credits beyond for which they get charged. Refer to the BigQuery storage API section of the official pricing document.
Streaming API Inserts:
BigQuery data loads are not charged with the exception of streaming data. Streaming data has a fee based on the volume of streaming data inserted into BigQuery. Details can be found in the Streaming API section of the official pricing document.
Similar to the storage costs, there are two pricing models available for querying BigQuery.
- On-demand — This pricing model is the most flexible and is really suitable for workloads that are sporadic in nature. The pricing is based on the amount of data processed by each query run by you.
- Flat-rate — This pricing Model is best suitable for customers who have a predictable workload and a set budget. In this model, customers are not charged for resources used for query processing and, instead, are charged a flat rate for a dedicated amount of resources.
Refer to the official Google pricing guide Google Cloud SKUs for up to date pricing.
The estimated data processed for queries can be evaluated by using the BigQuery editor.
Google Cost Estimator:
Scenario planning for BigQuery cloud costs can easily be performed using the Google Cost Estimator tool.
Now that you are ready to start loading data into BigQuery, your next task would be to figure out how to move data to BigQuery. You can leverage native tools like data transfer services provided by Google Cloud to fulfill some of your data replication requirements. However, these applications don’t provide support for all the data sources you need, and it is almost sure that you have to look for alternatives.
Our cloud-based data pipeline, Daton, provides a simple yet cost-effective way to replicate your data to Redshift. Daton has a variety of pre-built adapters for databases, SaaS applications, files, webhooks, marketing applications, and more. Replicate your data from any source to Redshift in three simple steps without having to write any code in a matter of minutes.
Sign up for a free trial today to kick-start your data warehousing initiative.
Not ready yet? – Talk to our data architects who are happy to answer your questions.
Send us a note – we love to hear from you!