This blog was originally written by Keith Smith and updated for 2024 by Justin Delisi.
Snowflake’s Data Cloud has emerged as a leader in cloud data warehousing. As a fundamental piece of the modern data stack, Snowflake is helping thousands of businesses store, transform, and derive insights from their data easier, faster, and more efficiently than ever before.
At phData we’ve had the pleasure of helping many of those businesses succeed with the platform, which has granted us a unique look into the Data Cloud, especially to how much it costs.
In this blog, we’ll explain what makes up the Snowflake Data Cloud, how some of the key components work, and finally some estimates on how much it will cost your business to utilize Snowflake.
What is the Snowflake Data Cloud?
The Snowflake Data Cloud was unveiled in 2020 as the next iteration of Snowflake’s journey to simplify how organizations interact with their data. The Data Cloud applies technology to solve data problems that exist with every customer, namely; availability, performance, and access.
Simplifying how everyone interacts with their data lowers the barrier to entry by providing a consistent experience anywhere around the globe.
The primary objective of this idea is to democratize data and make it transparent by breaking down data silos that cause friction when solving business problems.
In addition to breaking down internal data silos, Snowflake unlocks the ability to break down external data silos that accelerate partnerships and efficiency via data sharing and data exchange.
What Components Make up the Snowflake Data Cloud?
The Snowflake Data Cloud is new terminology but breaking down each of the components allows us to understand the complete data solution:
- Cloud Data Warehouse
- Compute isolation
- Connect existing tools and reports
- Be the center of your Business Intelligence Strategy
- Cloud Data Lake
- Centralized repository to store any type of data
- Structured
- Unstructured
- Centralized repository to store any type of data
- Data Engineering
- Build reliable data pipelines with Snowflake automation
- Streams
- Tasks
- Snowpipe
- Build reliable data pipelines with Snowflake automation
- Data Science
- Prepare, standardize, and serve data for building models
- Feature Store
- Experiment and Coefficient History
- Prepare, standardize, and serve data for building models
- Data Applications
- Availability of data and compute is taken care of
- Access and Store your data anywhere across clouds
- Data Exchange and Sharing
- Access external datasets like they were your own, without actually having to move or ingest the data
- Share your data in or outside the business with security guaranteed
What is a Cloud Data Mesh?
In 2022, the term data mesh has started to become increasingly popular among Snowflake and the broader industry. This data architecture aims to solve a lot of the problems that have plagued enterprises for years.
Rather than focusing on the individual data consumers of your enterprise data strategy, a data mesh instead focuses on how data is managed and designed to drive business value. The main goal of a data mesh structure is to drive:
- Domain-driven ownership
- Data as a product
- Self-service infrastructure
- Federated governance
One of the primary challenges that organizations face is data governance. As they grow in both their complexity and data production/consumption, a data governance strategy needs to be designed as part of your information architecture.
This data mesh strategy combined with the end consumers of your data cloud enables your business to scale effectively, securely, and reliably without sacrificing speed-to-market.
What is a Cloud Data Warehouse?
A cloud data warehouse is designed to combine a concept that every organization knows, namely a data warehouse, and optimizes the components of it, for the cloud.
As an example, an IT team could easily take the knowledge of database deployment from on-premises and deploy the same solution in the cloud on an always-running virtual machine.
This is “lift-and-shift,” while it works, it doesn’t take full advantage of the cloud. For example, most data warehouse workloads peak during certain times, say during business hours. Lift-and-shift models mean you continue to pay for computing resources even when they are not being used.
Additionally, it suffers from the same heavy operational and performance burdens as on-premises offerings, namely contention from multiple users on the system as well as requiring operating system and disk maintenance.
Since the cloud offers the promise of elasticity the ideal solution is:
- Scale automatically, regardless of usage, with minimal contention
- There is no hardware (virtual or physical) to select, install, configure, or manage
- There is virtually no software to install, configure, or manage
- Ongoing maintenance, management, upgrades, and tuning are handled by Snowflake
To enable this vision, Snowflake modernized the architecture by offering the following:
- Single source for all data regardless of data type
- Available worldwide
- Available on any cloud
- Separate compute and storage
- Elastically scale compute
- Elastically scale storage
- Usage-based pricing model
- Always available metadata
- Snowflake maintains your metadata while listening for work to do
- Always available compute
- Snowflake maintains a pool of instances ready to serve your queries
What is a Data Warehouse?
A data warehouse is a centralized and structured storage system that enables organizations to efficiently store, manage, and analyze large volumes of data for business intelligence and reporting purposes.
What is a Data Lake?
A Data Lake is a location to store raw data that is in any format that an organization may produce or collect. Effectively this is a way to store the source of truth and build (or rebuild) your downstream data products (including data warehouses) from it.
What is the Difference Between a Data Lake and a Data Warehouse?
Historically, there were big differences. If you go back to 2014, data warehouse platforms were built using legacy architectures that had drawbacks when it came to cost, scale, and flexibility.
The data lake platforms, meanwhile, were built using more modern architectures and featured easier scale and lower cost. As a result, if you had lots of data (which can often happen with raw or unstructured data), you’d typically go with the data lake platform.
Today, data lakes and data warehouses are colliding. Snowflake was founded on the capability to handle data warehouse workloads in the cloud. Since this has been wildly successful, they have also been able to tackle the task of processing unstructured data natively on the platform as well.
When using a platform like Snowflake, there is effectively no difference between a data lake and a data warehouse. With the Snowflake variant data type, 90% of Data Lake use cases can be solved natively. This removes the barrier that file-based data lakes present because you no longer have to worry about file formats or compression and can instead focus on data access and management.
Transition to the Data Cloud
With multiple ways to interact with your company’s data, Snowflake has built a common access point that handles data lake access, data warehouse access, and data sharing access into one protocol.
This is why we believe that the traditional definitions of data management will change where the platform will be able to handle each type of data requirement natively.
What kinds of Workloads Does Snowflake Handle?
- Data Warehousing: Snowflake is primarily built for data warehousing workloads, providing a centralized repository for storing and managing structured and semi-structured data from various sources.
- Data Analytics: It supports complex data analytics workloads, enabling organizations to run ad-hoc queries, perform data exploration, and generate insights from their data.
- Data Processing: Snowflake can process large datasets and perform data transformations, making it suitable for ETL (Extract, Transform, Load) processes.
- Business Intelligence (BI): It facilitates BI workloads by providing tools and capabilities for generating reports, dashboards, and visualizations to support decision-making.
- Data Sharing: Snowflake allows organizations to securely share data with external partners or customers, making it useful for collaboration and data monetization.
- Advanced Analytics: Snowflake can integrate with various data science and machine learning tools, allowing organizations to perform advanced analytics and build predictive models on their data.
- Real-time Data: Snowflake can ingest and process real-time data streams for applications requiring up-to-the-minute insights.
- Large-scale Data Storage: Snowflake can scale to accommodate massive datasets, making it suitable for organizations with significant data storage needs.
How Much Does Snowflake Cost?
This is one of the most common questions we encounter because switching from a fixed-cost pricing model to a usage-based pricing model can cause significant heartburn. We have thoughts on how to control and estimate costs with Snowflake but here we aim to give you a rough cost estimate.
Since every situation is different we are going to provide estimates and ranges based on data size and quantity.
Before diving into the actual costs, it is worth noting that Snowflake is also beneficial for customers that are smaller than our “Small” bucket and larger than our “Large” bucket. These t-shirt size buckets were selected because they are the typical sizes we see customers initially choose when engaging with phData, but there are many Snowflake customers that start out much smaller than 5TB worth of data with lower spend.
On the flip side, we have seen “Large” customers mature and require larger spend due to security, data localization, or even data replicated across different clouds. Each of these can increase the cost significantly, especially as you scale out beyond your initial Snowflake instance or require higher-tier features.
Costs are also determined by whether you’re using managed services (Snowpipe, Streams, Tasks) and by how real-time your data is (streaming vs. batch). Calculating costs for continuous ingestion using Snowpipe has many different layers.
How do Snowflake Costs Change Over Time?
All Snowflake customers begin their journey with an empty environment. This means that before a customer can start realizing the benefits that the Snowflake platform offers, the task of ingesting data from source systems as well as training and onboarding developers is required.
With this in mind, there is a typical learning curve (or ramp-up time) that is required to use any product, and Snowflake is no different. We typically see early work begin as handwritten code and shift to automation to help scale migrations, transformations, and user onboarding.
We also observe that the typical customer will see their computing costs change over time. This is unique to Snowflake because it enables everyone to see which part of the process consumes the most cloud credits, which in turn allows us to see where the money is being spent.
Since ingest pipelines are 1-to-1 and reporting/analytics are 1-to-many we witness a shift in spending as customers mature:
It is phData’s perspective that the data culture inside an organization matures by building increasingly valuable assets inside a data platform that delivers results to the end-users (analysts, BI, reports, DS, etc). As this happens, spending shifts from ELT to analytics.
This is why successful customers spend proportionally more on analytic workloads because they are in such high demand and driving business objectives.
Sizing Cost of Snowflake
Snowflake bills per second.
Snowflake also acts as a serverless compute layer, where the virtual warehouses being used to do work can be turned on or off many times over the course of the day. To simplify this discussion and smooth out assumptions across a longer time period, we typically estimate how many hours a day that a virtual warehouse cluster is required to be on, which is why the following section will state hourly rates.
Additionally, since Snowflake offers the unique ability to track costs for each step of the data lifecycle we are able to better understand what type of computing requirements a customer will have and plug that into our calculations.
Small
Customers in this range typically spend between $25k-$75k.
To get to this amount we make the following assumptions:
- 10-20 analytics users
- 10-20 ELT pipelines
- Under 5 TB of data
- Most work is done by analysts during business hours
Here is a more concrete example from one of our customers:
- Small and Large-sized warehouses to perform ELT work
- Medium-sized warehouses for analytics
Cluster Size | Cluster Amount | Hours | Credits per Hour | Credit Price | Total Cost per Hour |
---|---|---|---|---|---|
Small | 3 | 1 | 2 | 3 | $18 |
Medium | 2 | 4 | 4 | 3 | $96 |
Large | 1 | 1 | 8 | 3 | $24 |
Storage Size | Storage Cost | Compute Cost | Total Cost |
---|---|---|---|
5 TB | $1,380 | $50,370 | $51,750 |
Medium
Customers in this range typically spend between $100k-$200k.
To get to this amount we make the following assumptions:
- 30-50 analytics users
- 30-50 ELT pipelines
- Under 50 TB of data
- Most work is done by analysts during business hours
Here is a more concrete example from one of our customers:
- Small and Large-sized warehouses to perform ELT work
- Medium-sized warehouses for analytics
Cluster Size | Cluster Amount | Hours | Credits per Hour | Credit Price | Total Cost per Hour |
---|---|---|---|---|---|
Small | 5 | 2 | 2 | 3 | $60 |
Medium | 4 | 6 | 4 | 3 | $288 |
Large | 2 | 2 | 8 | 3 | $96 |
Storage Size | Storage Cost | Compute Cost | Total Cost |
---|---|---|---|
50 TB | $13,800 | $162,060 | $175,860 |
Large
Customers in this range typically spend between $300k-$500k.
To get to this amount we make the following assumptions:
- 100+ analytics users
- 100s – 1000s of ELT pipelines
- Under 100+ TBs of data
- Work being done around the clock
Here is a more concrete example from one of our customers:
- Small, Medium, and Large-sized warehouses to perform ELT work
- Ensure workloads are right-sized
- Medium and Large-sized warehouses for analytics
Cluster Size | Cluster Amount | Hours | Credits per Hour | Credit Price | Total Cost per Hour |
---|---|---|---|---|---|
Small | 10 | 2 | 2 | 3 | $120 |
Medium | 10 | 6 | 4 | 3 | $720 |
Large | 5 | 2 | 8 | 3 | $240 |
Storage Size | Storage Cost | Compute Cost | Total Cost |
---|---|---|---|
200 TB | $55,200 | $394,200 | $449,4000 |
Why Does This Matter?
Data and demand for information have been increasing exponentially since the dawn of the information age.
Snowflake has been able to get in front of and understand the unique challenges that this growth and demand has presented to businesses across the globe.
Enabling access to data, when you want it, where you want it without delay is a core principle to their success.
We hope that distilling each of these terms and paradigm shifts helps educate you on your journey to solve your business objectives.
If you are interested in learning more about Snowflake, phData has created (and continues to create) hundreds of free resources to explore.
If you’d like a more personalized look into the potential of Snowflake for your business, definitely book one of our free Snowflake migration assessment sessions. These casual, informative sessions offer straightforward answers and honest advice for moving your data to Snowflake.