Data is rapidly becoming the hottest commodity within our global economy. It’s estimated that the global data monetization market size will grow from $2.3 billion in 2020 to $6.1 billion by 2025 at a Compound Annual Growth Rate (CAGR) of 21.7 percent during the forecast period.
As enterprises continue to adopt the Snowflake Data Cloud for their cloud data platform needs, a key piece of functionality that enterprises can leverage is Snowflake Data Sharing.Â
Data sharing allows you to easily provide data to end consumers or consume data from other producers.
Today, we’re going to take a look at Snowflake data sharing, the Snowflake data marketplace, the different ways to share data between Snowflake accounts & reader accounts, and go through a tutorial to show you step by step how to leverage Snowflake data sharing.
What Is Snowflake Data Sharing?
Simply put, data sharing allows you to share selected objects with another Snowflake account or with a reader account (more on that later). One of the primary benefits of data sharing is that data isn’t copied or transferred between accounts.
Why is this a massive advantage for organizations? Well, when building data pipelines and data products, it’s very common to move data around between databases and various systems so you can blend data together. Â
Let’s say you have transactional data within your online transactional processing (OLTP) database and you want to combine it with some external data for a machine learning model. Historically, organizations would export data into a data lake, import that external data into their data lake, and leverage tools like Apache Spark to perform some analysis.
What if instead you copied your data into Snowflake and that external data source was able to share data with your organization without you having to load it? This removes the hurdle of having to keep copies of data in sync with each other, saving on storage, compute cost, and maintenance.
Imagine that your company generated a lot of desirable information that allows other companies to make informed decisions. For example, what if your company could provide accurate estimates for how long a product would take to deliver based on proprietary information and you wanted to sell that information to your customers?
This is exactly what Snowflake data sharing allows you to do.
What are the Different Ways to Share Data?
Snowflake gives you a few different options for sharing data. You may want to explicitly share data with one other entity or you may want to make your data available for any and all consumers. Â
Snowflake provides the ability to use direct shares, data exchanges, or the Snowflake Data Marketplace. Direct shares are the simplest form of data sharing and are meant for account-to-account sharing. Data exchanges and the Snowflake Data Marketplace are meant for broader access to your data. Â
Data exchanges are your hub for securely collaborating between invited members whereas the Snowflake Data Marketplace is available to all Snowflake accounts in non-VPS regions.
Once you’ve decided that you want to either monetize or share your data with other organizations, there are a few decisions you need to make on how you want to share that data, namely:
- Who is going to pay for compute resources?
- How do you want to implement security?
- How broadly do you want to share your data?
When providing data through a share, you can either share data with another Snowflake account or through reader accounts. When sharing data with another Snowflake account, the consumer is responsible for the compute objects (virtual warehouses) used to query the shared data.Â
If the end consumer does not have a Snowflake account, a reader account can be set up instead. This will leverage compute objects within the data provider’s account.
To share data securely, Snowflake provides the ability to generate secure views, secure materialized views, and secure UDFs. These types of objects allow consumers to read data without having visibility into the underlying SQL that’s generating the consumable data.Â
Secure [materialized] views are specifically designed to not leverage any optimizations that may accidentally expose business logic and ensure that users have no access to the underlying data.
What is the Snowflake Data Marketplace?
The Snowflake Data Marketplace utilizes Snowflake Secure Data Sharing to connect providers of data with consumers of data.
The Snowflake Data Marketplace allows for consumers of data to discover and access a variety of third-party data. If your company needs to source and consume data from multiple sources in order to drive analytics or machine learning models, you now have a one-stop-shop location to do so.
When deciding to list data on the Snowflake Data Marketplace, there are two types of listings to choose from, Standard and Personalized.
Standard Data Listings provide consumers with instant access to published data sets and generally provide generic, aggregated, or non-customer-specific data.
Personalized Data Listings on the other hand allow consumers to request specific data sets from producers. It may be advantageous for a data producer to provide a generic standard data listing and allow for data consumers to request more personalized data as required for more profit.
How Do I Implement Snowflake Data Sharing?
Now that we’ve talked about the different types of data sharing and how you can leverage it, let’s go step by step through setting up a data share, creating a secure view, and sharing it both directly and with the Snowflake Data Marketplace.
For this example, we have a use case where we want to blend some customer information and some marketing information to provide as a data share to consumers. To offer that information to consumers, we’re going to need to create:
- A sales and marketing database with schemas and tables
- A database to house our shares
- Permissions to allow the shared view to access our sales and marketing data
- A secure view to blend data together
The data definition language (DDL) for this looks like the following:
-- Objects should be owned by SYSADMIN
use role SYSADMIN;
-- Create sales objects
create database sales;
create schema sales.my_schema;
create table sales.my_schema.customer (
customer_id int,
customer_name string
);
-- Create marketing objects
create database marketing;
create schema marketing.my_schema;
create table marketing.my_schema.conversions (
customer_id int,
campaign string,
converted boolean
);
-- Create database to hold shares
create database sharedb;
create schema sharedb.shares;
-- Sample view to be shared
-- This will require granting REFERENCE_USAGE on the sales db and marketing db
create secure view as sharedb.shares.sharedview as
select customer.customer_id, customer.customer_name, conversions.campaign, conversions.converted
from sales.my_schema.customer customer
join marketing.my_schema.conversions conversions on customer.customer_id = conversions.customer_id
where converted;
-- Need to use ACCOUNTADMIM to create share
use role ACCOUNTADMIN;
-- Create share
create share myshare;
-- Grants should be owned by SECURITYADMIN
use role SECURITYADMIN;
-- Need to grant usage and reference_usage on objects
grant usage on database sharedb to share myshare;
grant usage on schema sharedb.shares to share myshare;
grant reference_usage on database sales to share myshare;
grant reference_usage on database marketing to share myshare;
grant select on view sharedb.shares.sharedview to share myshare;
Visually, this looks like the following:
Great! We now have the base for our share. Let’s look at the different ways to allow consumers to access this share
Direct Share
As previously mentioned, a direct share is the simplest way to allow consumers to access your shared data. This will rely on the data provider having access to the account ids for the consumer accounts.
The DDL for allowing another account to access your share is:
-- Need to use ACCOUNTADMIN to alter shares
use role ACCOUNTADMIN;
-- Need to alter the share to add an account
alter share myshare add accounts=ABC123;
If you want to verify that your share was added correctly or later on view if your share has been used in the consuming account, you can use the following:
show shares;
show grants of share myshare;
The first command will output all shares within your account. You can filter by kind OUTBOUND to view data shares you’re providing. The second command will output if consumers have created an INBOUND share of your OUTBOUND share. This is essential if you want to see if your shares are being used and by which accounts.
How Do I Setup a Data Share as a Consumer?
Configuration for a data share is a two-way handshake between accounts. In other words, a producer account cannot force the creation of objects within another account; the consuming account needs to establish a share resource within its own account.
This is how you would establish a consumer of the share from our provider account:
/* Bring the shared database into your account by creating a database from the share. */
/* */
/* In the following commands, the share name must be fully qualified by replacing */
/* with the name of the account that provided the share, e.g. */
-- Shares need to be created by ACCOUNTADMIN
use role ACCOUNTADMIN;
-- View details about the share
desc share .myshare;
-- Create database from share
create database _share from share .myshare
-- Grant privileges on the database to other roles (e.g. SYSADMIN) in your account.
grant imported privileges on database _share to SYSADMIN;
-- Now you can query the view
use role SYSADMIN;
show views;
use warehouse ;
select * from sharedview;
Visually, we now have the following:
Sharing Data to a Reader Account
If you want to share data with a consumer that doesn’t have a Snowflake account, you will need to use a reader account. When you create a reader account, a new account name will be generated that the user will use to log in and query your data. You will need to establish a username and password for this account when you create it.
Let’s create a reader account and add it to our data share.
-- Need to be ACCOUNTADMIN to create reader accounts
use role accountadmin;
create managed account reader_acct1
admin_name = 'MySecretUsername' , admin_password = 'MySuperSecretPassword' ,
type = reader;
/*
OUTPUTS
+-------------------------------------------------------------------------------+
| status |
|-------------------------------------------------------------------------------|
| {"accountName":"RE47190","loginUrl":"https://re47190.snowflakecomputing.com"} |
+-------------------------------------------------------------------------------+
*/
-- It takes a moment to create. Once created we need to grant access to share.
-- The previous command will output the accountName for the reader account.
-- Need to alter the share to add an account
alter share myshare add accounts=RE47190;
Once the share has been set up within the provider account, you will need to login to the reader account and set up the share and other objects just as with a direct share to another Snowflake account. Once this is done, the objects look like this:
Best Practices with Data Shares
When providing and consuming data via shares, there are a number of best practices that you should follow. These include things like validating data shares, auditing access to data shares, and adding/removing objects from shares as necessary.
Let’s dig into how to perform each of these.
How Do I Validate Data Shares?
Before putting secure [materialized] views into your share, it’s important to ensure that access to your data matches what you expect. If you’re using Secure UDFs or filtering by account name within your secure [materialized] view, you’ll want to ensure that your query logic is being applied correctly.
Snowflake provides a session parameter to simulate accessing data as if you were a data share consumer. Use the following commands to validate your view:
-- Test the table and secure view by first querying the data as the provider account.
select count(*) from sharedb.shares.sharedview;
select top 1 * from sharedb.shares.sharedview;
-- Set session parameter to use the shared customer account
alter session set simulated_data_sharing_consumer=ABC123;
-- Validate the simulated data sharing consumer is able to access the appropriate data
select count(*) from sharedb.shares.sharedview;
select top 10 * from sharedb.shares.sharedview;
Once you’ve validated that the consumer is only able to access the appropriate data, then you can alter your share to add the consumer account.
How Do I Audit Access To Data Shares?
When sharing data to many accounts, it’s recommended to audit that those accounts are set up to consume that data and utilize that share. If you have accounts that aren’t using a configured data share and don’t plan to, you should limit your security footprint and remove those accounts from your share.
To audit these data shares, there are three SQL commands that you can execute.
show shares;
This command will show you what shares have been set up on your account (both incoming and outgoing), which accounts those shares are configured with, the owner of the share, and what database is being used in the share. This will allow you to audit that the correct accounts have access to the correct shared databases.
show grants to share myshare;
This command will show you what objects have been added to myshare. This will allow you to validate that only the objects that you want to share are actually available in the share to consuming accounts.
show grants of share myshare;
This command is used to determine if your share is being used by consuming accounts. This command will list all the accounts that have created a database from the share. If no accounts have created a database from the share, the results will be empty.
How Do I Add and Remove Objects From a Share?
Once a share has been created, you can add and remove objects as necessary. When you add or remove an object from a share, the share is instantaneously updated to reflect those changes both in your account and the consuming accounts.
Let’s say we’ve created another secure view within our shared named campaigns which allows consumers to see all the marketing campaigns that have been run. We would create this secure view and then add it to our share.
-- Grants should be owned by SECURITYADMIN
use role SECURITYADMIN;
create secure view as sharedb.shares.campaigns as
select customer.customer_id, customer.customer_name, conversions.campaign, conversions.converted
from sales.my_schema.customer customer
join marketing.my_schema.conversions conversions on customer.customer_id = conversions.customer_id;
grant select on view sharedb.shares.campaigns to share myshare;
If we decide later on that we want to remove this from the share and subsequently all the consumers of our share, we would run the following:
-- Grants should be owned by SECURITYADMIN
use role SECURITYADMIN;
revoke select on view sharedb.shares.campaigns to share myshare;
In Closing
Hopefully, by now, you have a clearer understanding of how data sharing works in Snowflake as well as a foundation in place to get started sharing (and monetizing) your data.Â
If you’re looking for more Snowflake help, we’d genuinely love to help! As an Elite Snowflake Partner, phData has a wide range of experience helping businesses of all sizes succeed with Snowflake.Â
Whether you need help creating a solid plan to implement data sharing at your organization or if you just need an expert to answer a few questions, phData would love to help!
FAQs
When sharing data to another Snowflake account, the consumer of the account pays for the credit usage to query the underlying data in the share. If you create a reader account for a non-Snowflake customer to consume data, the reader account is tied to the account that created it, and the account that created it will pay for credit consumption.
Yes! This is the primary benefit of data sharing. All sharing is accomplished through Snowflake’s unique services layer and metadata store and zero data is transferred to the consuming account’s storage.