Pros and Cons of Amazon Redshift
Before deciding on whether Amazon Redshift is right for you or not, it is essential to understand the Pros and Cons of Amazon Redshift.
What is Amazon Redshift?
Amazon Web Services (AWS) is the first public cloud provider to offer a cloud-based, petabyte-scale data-warehousing service. The service is called Amazon Redshift and is the most popular cloud data warehouse.
Amazon claims thousands of businesses as its clients, but rivalry in this field is now growing with Google
BigQuery, Snowflake, Oracle Autonomous Data Warehouse all vying for a share in the growing cloud data warehouse market.
Amazon Redshift has been around since the beginning of 2013 and has undergone several enhancements during this time. Amazon Redshift Spectrum, AWS Athena and the omnipresent, massively-scalable, data storage solution, Amazon S3, compliment Amazon Redshift and together offer all the technologies needed to build a data warehouse or data lake on an enterprise scale. Let us dig a little deeper to understand the pros and cons of Amazon Redshift in more detail.
Amazon Redshift Pros:
- Widely adopted – As one of the first cloud-native data warehousing technologies today, Amazon Redshift has a thriving and robust customer base. A healthy ecosystem of knowledgable resources are available to support organizations in the extracting value from their data warehousing initiatives.
- Ease of Administration – Amazon Redshift offers tooling to reduce the administrative burden typically involved in the running of a database. Tooling is made available to create clusters easily, to automate the process of backing up the database, to scale the data warehouse up and down. All these activities required database administrators in the past. Still, with the out of the box tooling available with Amazon Redshift, users can click a few buttons or call REST APIs to accomplish these tasks.
- Ideal for Data Lakes: Amazon Redshift Spectrum extends the capability of Redshift by allowing the system to scale compute and storage independent of each other and issues queries on data stored in S3 buckets.
- Easy of Querying – Amazon Redshift has a similar querying language to the immensely popular, PostgreSQL. Anyone familiar with PostgreSQL can use the SQL skills they have to start engaging with a Redshift Clusters. JDBC and ODBC support opens up the possibility for developers to connect to their Redshift clusters using the DB query tool of their liking. Redshift console also provides an option for users to issue queries and work on the database. However, power users may prefer to use a tool of their choice. Most business intelligence tools in the market today support Amazon Redshift.
- Columnar Storage – When rows get inserted into a relational database, they typically get stored in a row format. Although row formats are very efficient in dealing with write operations, they underperform when it comes to read operations. Columnar compression makes use of the fact that redundant data in each row and missing data in fields can be compressed more efficiently by taking a column-oriented compression approach. By compression column data, the storage footprint on the disk significantly reduces. A query issued on a set of columns can scan a smaller footprint of data, transfer a lower volume of data over the network or I/O subsystem, to the compute node for processing leading to a significant improvement in the performance of analytical query processing.
- Performance – Amazon Redshift is an MPP database. MPP stands for Massively Parallel Processing. Efficient implementation of columnar storage algorithms and data partitioning techniques give Amazon Redshift an edge in terms of performance.
- Scalability – The ability to scale is one of the most important aspects of a database and Amazon Redshift is no different. In comparison to scaling an on-premise database, scaling a Redshift cluster is like having a piece of cake. Internal complications involving hardware expansion, VM resizing, rebalancing of data amongst the nodes are entirely handled by Redshift and hidden under a UI button or a REST API call.
- Pricing – Many factors contribute to the final price of an Amazon Redshift cluster. For anyone considering Amazon Redshift as their data warehouse of choice, it is essential to understand these factors in detail to avoid any surprises later. You can read a more in-depth article on Amazon Redshift pricing here. With a wide variety of pricing models, flexibility in terms of deployment, Amazon Redshift offers something for every company, regardless of size.
- Security – Security is a significant roadblock in the adoption of cloud services for many companies. However, it is essential to realize that cloud services, when appropriately configured, offer a vastly higher degree of protection when compared to the security setups done by internal IT teams. The scale of public clouds enable them to hire more resources and deploy them to monitor and secure the cloud environment 24 X 7 X 365. Amazon Webservices is no different. When we talk about Amazon Redshift security, it cannot happen in isolation. The security capabilities offered by Amazon Redshift are available to users on top of the security implement at the cloud services layer. Robust identity and access management, role-based access control (RBAC), encryption in-transit and at rest, SSL connections are some of the security features in Redshift. You can read more about them here. AWS services like Redshift are HIPAA, SOC2 Type II, Fedramp, and PCI certified. More details about the certification process are available here.
- Strong AWS Ecosystem: If you are considering Amazon Redshift as your data warehouse, it is quite likely that you have some environments already running on AWS. As important as it is to select best of breed applications for your workloads, it is also essential to factor in other aspects like community support, pricing and discounting, skillset within the company. Selecting a technology often has both strategic and tactical implications. It may not matter a lot for smaller organizations. Still, larger organizations with well-established teams have to factor in these factors before deciding any software purchase, and that includes the selection of a data warehouse. With a wide variety of services on offer in AWS, organizations can benefit from bundling their services to get higher discounts for services used.
Amazon Redshift Cons
Amazon Redshift is a data warehousing system by design. The entire service is tuned and optimized for a specific type of workload, analytics data processing. If you are interested in a database that does efficient transaction processing, then AWS has many other services like Amazon Aurora, Amazon RDS, DynamoDB, and others that you may want to consider.
Not a multi-cloud solution: While the ecosystem plays a vital role in driving the choice of software, a lack of choice is seen as a mechanism by the software vendor to you lock customers into their service offerings. Amazon Redshift, unlike Snowflake, is only available on AWS. If you are a user of Azure or GCP or Oracle Cloud, then carefully evaluate solutions offered by those cloud providers before deciding to go with Amazon Redshift.
Amazon Redshift is not 100% managed – Although tooling provided by Amazon reduces the need to have a database administrator fulltime, it does not eliminate the need for one. Amazon Redshift is known to have issues with handling storage efficiently in an environment prone to frequent deletes. Maintaining sort order is also a critical criteria to factor in to achieve efficient performance metrics. These aspects of the database are generally not well known to developers, and one would argue that they should not care. And they would be right.
The current improvements in database technology can eliminate the need for users to understand these database administration topics and manage the database to deliver optimum performance without ever needing a database administrator. Snowflake and Oracle Autonomous data warehouse have made massive strides in this regard. Amazon Redshift has already released a slew of features like automatic table sort, automatic vacuum delete, and automatic analyze which certainly demonstrates progress on this front.
Pricing: Modern data warehouses like Snowflake and Google BigQuery are capable of charging customers on a per-second basis or based on the amount of storage processed to handle a request. For on-demand workloads, a pricing scheme with the lowest granularity being an hour like it is with Amazon Redshift, impacts pricing significantly. On the other hand, AWS also offers a discount for committed capacity that may bring down the price and make it comparable to other cloud data warehousing services.
Concurrent execution is a known challenge in MPP databases. In an environment where multiple concurrent users are executing queries, Redshift could run into performance problems. Due to the lack of separation of compute and storage, it is quite likely that read workloads get impacted due to substantial writing that may be going on in the database due to a massive batch processing job.
Cluster resize causes a disruption in the service to the end-user. Although the disruption is minimal, the lack of seamless cluster resize capability can be considered a drawback in a market where competitors are offering capabilities to scale up and down without downtime. For most businesses, this minor disruption is tolerable, but an issue nevertheless.
Choice of keys impacts performance and price – In the cloud world, performance = price. Users have to carefully design their strategies around distribution keys and sort keys while keeping an eye on future requirements. They should also reassess the validity of their sort key and distribution key choices regularly as more data gets ingested into the Amazon Redshift data warehouse. A sub-optimal design can increase the costs of the Redshift data warehouse because the system performance degrades which inturn causes user satisfaction issues. It is easy to increase the cluster size to deal with the problem, but that would increase your costs. Still, a careful key strategy allows companies to get the most out of their Amazon Redshift investment before requiring to scale up.
A Master Node plays a critical role in the Redshift architecture by orchestrating the allocation, execution, and aggregation of queries and its execution results. All clients interact with the master node only, and therefore a non-redundant master node creates a single point of failure for the environment.
Not a Serverless architecture: Amazon Redshift is an old guard when it comes to cloud data warehouses. Naturally, Redshift also suffers from some of the limitations that come with having been designed many years back. A serverless architecture enables the vendor to do a higher degree of hardware optimization, which in turn translates into lower prices for customers. When the same hardware gets utilized by three people vs one, the price is naturally going to come down. Old guards have their benefits by being around for a long time and innovating for a long time. These benefits sometimes outweigh the perceived drawbacks, and sometimes they do not.
In conclusion, the choice of a data warehouse depends on your usecase, your budget, the current state of the business, and your plans on using the data warehouse. We do not believe that there is an absolute right or wrong choice when it comes to technology selection. Feel free to reach out to us if you have questions about what data warehouse is a good fit for your business. Our data architects can guide you into taking the decision that is right for your business.
At Saras Analytics, we firmly believe in the power of data and how organizations of all sizes can now benefit from the rapid innovations in cloud data warehousing technologies. Read our article on why we believe it is time for every company to own and operate a data warehouse.
Our cloud-based data pipeline, Daton, provides a simple yet cost-effective way to replicate your data to Amazon Redshift. Daton has 100+ pre-built adapters for databases, SaaS applications, files, webhooks, marketing applications, and more. Replicate your data from any source to Amazon Redshift in three simple steps without having to write any code in a matter of minutes.
Not ready yet? – Talk to our data architects who are happy to answer your questions. Send us a note – we love to hear from you!
Click here to learn more about Amazon Redshift Pricing.