“Is our data safe in the cloud?”
Many of us have either pondered this question or encountered it in meetings. The Snowflake AI Data Cloud prioritizes data protection and offers a comprehensive suite of features designed to safeguard data against unauthorized access.
These features can be tailored to meet your organization’s specific needs. One such feature is Column-Level Security, which allows you to control access to sensitive data in your tables and views, ensuring that only authorized users can access specific columns.
In this blog, we will discuss:
What column-level security is in Snowflake
Basic and advanced methods for implementing column-level security via Dynamic Data Masking and External Tokenization
Maintaining (managing and tracking) column-level security policies
Best practices for implementing column-level security in Snowflake
What is Column-Level Security in Snowflake?
Column-level security in Snowflake involves applying a masking policy to any column within a table or view. This policy hides the column’s contents from unauthorized users while allowing authorized users to access the sensitive data during query runtime. These masking policies are schema-level objects designed to protect sensitive information without altering the actual data in the tables.
When a user attempts to retrieve data, the masking policy evaluates whether the user has the necessary permissions to view the sensitive information, ensuring that only authorized users can see it.
The policy will contain rules, generally defined via a CASE
construct (IF
–THEN
–ELSE
), to mask the data fully/partially for most users while keeping it as is for certain higher privileged users who need access for regulatory or operational usage.
By using Role-Based Access Control (RBAC), these masking policies allow users to govern the retrieval of sensitive data in Snowflake. The diagram below shows how access to sensitive data can be governed using masking policies.
Sensitive data can be PII (Personally Identifiable Information) fields like phone number, email, SSN, etc., or PHI (Protected Health Information) fields like medical history, test results, fingerprints, Faceprint, etc, or Financial Information like credit card numbers, bank account numbers, etc.
With Snowflake, organizations can determine which fields need protection from unauthorized access and enforce this using column-level security.
There are two major types of data masking policies currently used in Snowflake, External Tokenization and Dynamic Data Masking. We will discuss them in detail in the sections below.
External Tokenization
External Tokenization allows accounts to replace sensitive data with indecipherable tokens before loading it into Snowflake and to reverse the process (detokenize) during query execution. External Tokenization leverages masking policies in combination with external functions.
So basically, we have to call our external function in our masking policy definition.
This is particularly useful for organizations already having PII data encrypted by a passkey in other data systems like legacy databases and object stores like AWS S3.
In that scenario, the encryption and decryption code will reside outside Snowflake, for example, in an AWS Lambda. We can then call the decryption logic in our masking policy accordingly. This also allows using the same code for the encryption of new PII data in Snowflake and prevents code redundancy.
However, with the popularity of Snowpark, many organizations can decide to migrate the tokenization code to Snowflake itself and do the PII data masking using the Snowpark functions instead of using External Functions
Irrespective of that, External Tokenization has provided organizations with an option to centralize their data governance process.
External vendors provide tokenization methods like ALTR, Baffle, etc. However, AWS Lambda, GCP Function, and Azure Functions allow us to write our custom tokenization code and use it in Snowflake. We will discuss how we can use AWS Lambda as External Functions to implement External Tokenization in Snowflake.
How to Implement External Tokenization in Snowflake
Below is the Python code in the AWS Lambda function used for tokenization (string-tokenizer
):
import json
from cryptography.fernet import Fernet
import pandas as pd
import base64
import hashlib
import os
class PIITokenizer:
def __init__(self, secret_key: str):
"""
Initialize the tokenizer with a secret key.
:param secret_key: A string used to derive an encryption key.
"""
# Derive a 32-byte encryption key from the secret key
key = hashlib.sha256(secret_key.encode()).digest()
self.key = base64.urlsafe_b64encode(key[:32])
self.fernet = Fernet(self.key)
def tokenize(self, input_json: str) -> str:
"""
Tokenize sensitive PII data.
:param data: The sensitive data to be tokenized.
:return: The tokenized data.
"""
# Parse the input JSON string into a Python dictionary
input_data = json.loads(input_json)
# Convert the data into a DataFrame
df = pd.DataFrame(input_data["data"], columns=["row_number", "value", "action"])
# Check if the DataFrame is empty
if df.empty:
return json.dumps({"data": None})
# Apply the specified action (encrypt or decrypt) to the "value" column
df["value"] = df.apply(
lambda row: (
""
if row["value"] == ""
else None
if row["value"] is None
else self.fernet.encrypt(row["value"].encode()).decode()
if row["action"] == "encrypt"
else self.fernet.decrypt(row["value"].encode()).decode()
if row["action"] == "decrypt"
else "Invalid operation Type"
),
axis=1
)
# Convert the processed DataFrame back to the required JSON format
output_data = {"data": df[["row_number", "value"]].values.tolist()}
return json.dumps(output_data)
def lambda_handler(event, context):
# 200 is the HTTP status code for "ok".
status_code = 200
# The return value will contain an array of arrays (one inner array per input row).
array_of_rows_to_return = [ ]
try:
# From the input parameter named "event", get the body, which contains
# the input rows.
secret_key = os.environ.get('secret_key')#The secret key is stored as an env variable.
tokenizer = PIITokenizer(secret_key)
event_body = event["body"]
json_compatible_string_to_return = tokenizer.tokenize(event_body)
except Exception as err:
# 400 implies some type of error.
status_code = 400
# Tell the caller what this function could not handle.
raise err
# Return the return value and HTTP status code.
return {
'statusCode': status_code,
'body': json_compatible_string_to_return
}
Note: Lambda requires two layers provided by AWS: AWSLambdaPowertoolsPythonV3-python310-x86_64
and AWSSDKPandas-Python310.
To use this Lambda as an external function, we will have to:
Create an API Proxy Gateway so that the lambda is exposed to Snowflake securely. To set that up, please follow these instructions.
Create an API Integration in Snowflake for the API gateway created above. Below is the code to do that:
create or replace api integration external_tokenization_lambda_apigateway_integration
api_provider=aws_api_gateway
api_aws_role_arn='arn:aws:iam::**************:role/snowflake-external-function-role' -- replace this by the IAM role created for Snowflake in Step 1
api_allowed_prefixes=('https://*********.execute-api.us-west-2.amazonaws.com/snowflake-external-function-api-stage/' --replace this by the invocation URL of the API gateway stage created in Step 1
)
enabled=true;
Do a
DESC
on the API integration created above to get theIAM_USER_ARN
andEXTERNAL_ID
as shown below.
Edit the Trust Relationships of the AWS IAM role created for Snowflake in Step 1 to update the
IAM_USER_ARN
andEXTERNAL_ID
we got in Step 3 (as shown below).
Create the external function on the API integration and the API paths created in Steps 1 and 2 (as shown below).
CREATE EXTERNAL FUNCTION string_tokenizer( value VARCHAR,operation_type VARCHAR)
RETURNS VARCHAR
API_INTEGRATION = external_tokenization_lambda_apigateway_integration
AS 'https://*********.execute-api.us-west-2.amazonaws.com/snowflake-external-function-api-stage/StringTokenize'-- replace this by the API Gateway resource POST method invocation URL
;
Once the external function is created, we need to create a role and user to govern the masking process. Ideally, only the data steward/data governance lead/ security officer should be granted this role access to create masking policies. We can do this by the below set of commands:
--create a new role to be granted to data steward/data governance lead /security officer
use role useradmin;
CREATE ROLE masking_admin;
--assign that role to create masking policies in the schema
use role securityadmin;
GRANT USAGE ON DATABASE TO ROLE masking_admin;
GRANT USAGE ON SCHEMA TO ROLE masking_admin;
GRANT USAGE ON WAREHOUSE TO ROLE masking_admin;
GRANT CREATE MASKING POLICY on SCHEMA to ROLE masking_admin;
--assign the role to data steward/data governance lead /security officer
use role useradmin;
grant role masking_admin to user dummy;-- replace dummy with the user id of the data steward/data governance lead /security officer
--assign the masking admin role role privilege to use the external function
use role ;
GRANT USAGE ON FUNCTION string_tokenizer(VARCHAR,VARCHAR) TO ROLE MASKING_ADMIN;
Now, we will create the masking policy using the external function. This can be done by the following set of commands:
USE ROLE masking_admin;
-- create masking policy to grant PII_ANALYST users to see the decrypted data and all other users to see the encrypted data
create or replace masking policy .STRING_DECRYPT as (val string) returns string ->
case
when current_role() in ('PII_ANALYST') then string_tokenizer( val,'decrypt')
else val
end;
Once the masking policy is created, we will assign that masking policy to the tables containing the tokenized data. For this example, we will use a table named
CUST_DATA
which has the tokenized customer email address, SSN. We will apply the masking policy as shown below.
--use the role which owns the object
USE ROLE ;
alter table if exists CUST_DATA modify column CUST_EMAIL set masking policy .STRING_DECRYPT;
alter table if exists CUST_DATA modify column CUST_SSN set masking policy .STRING_DECRYPT;
Now if we query the CUST_DATA
table using the owner role or any other role that has SELECT
access on it, we’ll notice the tokenized form of the data (shown below).
But, when we query the table using the PII_ANALYST
role, we see the decrypted original data.
Note: As a pre-requisite, The PII_ANALYST
role must have SELECT
access on the table and USAGE
access on the DB and Schema.
This is how we can implement Column-Level Security on PII fields in Snowflake using Snowflake RBAC (role-based Access Control), AWS Lambda, Snowflake External Functions, and Masking Policies.
Dynamic Data Masking
Dynamic Data Masking uses Snowflake built-in functions to mask sensitive data during query runtime. The schema-level masking policies are added to the columns in tables and views, which contain the functional logic to mask the data.
During query execution, the masking policy is applied to the column wherever it appears, determining whether Snowflake users encounter a plain-text value, a partially masked value, or a fully masked value based on the policy rules, SQL execution context, and role hierarchy
How to Implement Dynamic Data Masking in Snowflake
We will implement Dynamic Data Masking on EMAIL
and SSN
columns in a table called CUST_INFO
for this demo. We will use the same roles we used in the implementation example for External Tokenization in the section above.
While creating the masking policy on the email field, we will do partial masking i.e. retain the domains but mask the user names in the email, and for the SSN field, we will do full masking. Below is the set of SQL statements required for this:
-- Create the masking policies
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY ..email_mask AS (val string) RETURNS string ->
CASE
WHEN CURRENT_ROLE() IN ('PII_ANALYST') THEN val
ELSE REGEXP_REPLACE(SPLIT_PART(val,'@',1), '([a-zA-Z0-9])', '*')||'@'||SPLIT_PART(val,'@',2) -- mask the user names but retain the domain value and punctuations
END;
CREATE OR REPLACE MASKING POLICY ..ssn_mask AS (val string) RETURNS string ->
CASE
WHEN CURRENT_ROLE() IN ('PII_ANALYST') THEN val
ELSE REPEAT('*', LENGTH(val)) -- mask fully
END;
Then we will apply the masking policies to the respective fields as shown below –
-- Apply the Masking Policies
alter table if exists ..CUST_INFO modify column CUST_EMAIL set masking policy email_mask;
alter table if exists ..CUST_INFO modify column CUST_SSN set masking policy ssn_mask;
Now, If we query using any role other than PII_ANALYST
, below is how we would see the data:
However, if we use the PII_ANALYST
role to query the data, we will see the original data as shown below:
Advanced Column-Level Implementation
1. Using Tokenization Function to Encrypt Inside Snowflake
If we want to use the same tokenization external function to encrypt PII data inside Snowflake, we can first create a masking policy for encryption as shown below and then follow the above Step 8 to assign this new masking policy to the required tables and columns.
USE ROLE masking_admin;
--PII Analyst sees the data as is after encryption but others will see the masked value
create or replace masking policy ..STRING_ENCRYPT as (val string) returns string ->
case
when current_role() in ('PII_ANALYST') then val else string_tokenizer( val,'encrypt')
end;
2. Tag-Based Masking Policy
A tag-based masking policy integrates the object tagging and masking policy features, enabling the application of a masking policy to a tag using the ALTER
TAG
command. When the data type specified in the masking policy signature matches the data type of a column, the tagged column is automatically safeguarded according to the conditions defined in the masking policy. These masking policy conditions can be designed to secure column data based on the assigned tag name or tag value. They can also ease the management of multiple masking policies in a single policy.
We will use the scenario from above to demonstrate this.
First, we will create a tag in which we will allow some specific values that correspond to our sensitive data attribute types and then assign those tags with the respective values to the EMAIL
and SSN columns of the CUST_INFO
table.
--CREATE TAG with some allowed values
CREATE OR REPLACE TAG ..SENSITIVE_INFO
ALLOWED_VALUES 'EMAIL','SSN', 'NAME', 'PHONE_NUMBER', 'BANK_ACCOUNT_NO' ,'ADDRESS' ;
--Assign Tag to required columns
ALTER TABLE ..CUST_INFO MODIFY COLUMN CUST_EMAIL SET TAG SENSITIVE_INFO = 'EMAIL' ;
ALTER TABLE ..CUST_INFO MODIFY COLUMN CUST_SSN SET TAG SENSITIVE_INFO = 'SSN' ;
If we have other tables and columns with customer addresses, names, or bank account numbers, we can similarly tag those columns using the ALTER TAG
command and the corresponding values.
Now, we will create the masking policy in which we will check the value of the tag of the column using SYSTEM$GET_TAG_ON_CURRENT_COLUMN
and design our masking strategy based on it. For email and address, we used partial masking and full masking for the rest of the fields. Here’s the SQL to do this:
--Create Masking Policy
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN CURRENT_ROLE() IN ('PII_ANALYST') THEN VAL ELSE
CASE
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('..SENSITIVE_INFO') in ('EMAIL')
THEN REGEXP_REPLACE(SPLIT_PART(val,'@',1), '([a-zA-Z0-9])', '*')||'@'||SPLIT_PART(val,'@',2) -- mask the user names but retain the domain value and punctuations
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('..SENSITIVE_INFO') in ('SSN','NAME','PHONE_NUMBER', 'BANK_ACCOUNT_NO')
THEN REPEAT('*', LENGTH(val)) -- mask fully
WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('..SENSITIVE_INFO') in ('ADDRESS')
THEN REPEAT('*', CEIL(LENGTH(val) / 2))||SUBSTR(val, CEIL(LENGTH(val) / 2) + 1) -- mask half the address
ELSE VAL
END
END
;
Once the masking policy is created, all we need to do is assign the masking policy to the tag (see below).
ALTER TAG ..SENSITIVE_INFO SET MASKING POLICY ..SENSTIVE_INFO_MASK;
Hence, this enables ease of use, allowing us to manage multiple sensitive fields with a single policy. It also allows scaling, whereby we can add the tag to any future tables/columns having sensitive data, and the data will get masked from the creation point itself.
3. Role Hierarchy and Context Functions
Since the Snowflake RBAC allows roles to be granted to other roles and thereby form a hierarchy, we might need to design our Masking Policies so that the roles that inherit the privileges of a role that has PII access also have PII access.
For example, in our use case, the PII_ANALYST
role is granted to the SYSADMIN
. We would want any user with SYSADMIN
role to have the same or more privilege than the PII_ANALYSTS
role. So using CURRENT_ROLE()
context function in the masking policy will defer the parent role of PII_ANALYST
for PII access.
We can prevent this by using the context function IS_ROLE_IN_SESSION(<Role Name>)
as shown below.
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN IS_ROLE_IN_SESSION('PII_ANALYST') THEN VAL ELSE
END
Now if we query the table CUST_INFO
using SYSADMIN
role, we will see the original data as it inherits the PII_ANALYST
role as shown below.
However, any other roles (Eg. DATA_ENGINEER
) which is not in role hierarchy of PII_
ANALYST
role will not have the PII access privilege as shown below –
Another scenario will arise when we are accessing objects that are not owned by the role we are using.
For example, in our use case, let’s consider that the PII_ANALYST
has created a view with PII fields like EMAIL
and SSN, etc. When other users query that, they ideally should not have access to the PII data unless otherwise specified. To implement this, we can use the INVOKER_ROLE()
context function, as shown below.
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN INVOKER_ROLE in ('PII_ANALYST') THEN VAL ELSE
END
However, if the user role inherits the privilege of the PII_ANALYST
role that had created the view, then the user role may need to have the PII access privilege too. In such a case, we can use the IS_GRANTED_TO_INVOKER_ROLE(<role name>)
context function, as shown below, to ensure the roles in the hierarchy of the view owner have the same PII privilege as the PII_ANALYST
role.
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN IS_GRANTED_TO_INVOKER_ROLE('PII_ANALYST') THEN VAL ELSE
END
Important Considerations
A masking policy must have the same data type for input and output i.e. we cannot define the input datatype as a number and return a string.
Even after defining the masking policy with the same input and output data type, if we assign that policy to a column with a different datatype, i.e., our masking policy is string-based, and we assign it to a NUMBER
column, then the query will ignore the masking policy and return the data as-is for that field.
Before dropping the masking policy, you have to unset the Policy from the column or tag using ALTER TABLE
or ALTER VIEW
or ALTER TAG
.
External Tokenization on a table with a huge volume of data might cause performance issues.
When inserting data from a source column with a masking policy into a target column without one, users with privileges on the target column may see unmasked data if their role allows viewing unmasked values in the source. Conversely, if a user sees masked data in the source column and inserts it into the target column, the data remains masked, potentially leading to a mix of masked and unmasked values in the target column.
Best Practices
To ensure consistent data security, carefully apply masking policies, review queries involving masked columns before sharing tables or views, and identify all potential target columns where data from the source column might appear.
Do not use NULL
as the masking character as that would result in COUNT
or other aggregation queries running through a user/role which/who will see the masked data returning incorrect results. Setting the masked character as a value allows users/roles to run certain aggregate queries (e.g. distinct count) with correct output and without the need for the PII data to be visible.
As provided by Snowflake documentation, the following table summarizes some of the considerations for the data governance strategies for masking sensitive data.
Policy Action
Centralized Management
Hybrid Management
Decentralized Management
Create policies
Security Officer
Security Officer
Individual Teams
Apply policies to columns
Security Officer
Individual Teams
Individual Teams
Please note that your organization will need to decide on a governance structure for who creates and manages the masking policies and who assigns them to the necessary sensitive fields in the objects.
Closing
Column-level security in Snowflake empowers organizations to protect sensitive data with precision, ensuring compliance and reducing risks. By leveraging features like masking policies and role-based access control, you can confidently share data while maintaining granular security.
As you implement these practices, remember to regularly review and refine your security configurations to align with evolving needs and compliance requirements.
If you’re looking to beef up security with your Snowflake account, we recommend attending one of our free Snowflake Security Assessments that will help your organization further identify vulnerabilities and offer best practices for safeguarding your data for Snowflake.
FAQs
What are Row Access Policies in Snowflake?
Row-Level Security (RLS) in Snowflake is a mechanism to control access to specific rows in tables or views based on dynamic conditions defined in something called Row Access policies. These schema-level objects evaluate user roles and other criteria at query runtime to enforce data access rules for operations like SELECT
, UPDATE
, DELETE
, and MERGE
. This allows centralized governance, enabling fine-grained access control to protect sensitive data.
What are some other security features provided by Snowflake?
Snowflake offers advanced, industry-leading security features to safeguard your account, users, and stored data. The following topics, primarily designed for administrators (such as users with ACCOUNTADMIN
, SYSADMIN
, or SECURITYADMIN
roles), highlight key security measures:
Federated Authentication and SSO
Key-pair authentication and rotation
Multi-factor authentication (MFA)
Snowflake OAuth and External OAuth
Authentication policies
AWS VPC interface endpoints for internal stages
Azure private endpoints for internal stages
Integration with AWS PrivateLink, Azure Private Link, and Google Cloud Private Service Connect
Snowflake Sessions and Session Policies
SCIM (System for Cross-domain Identity Management)
Role-based access control (RBAC)
End-to-end encryption
Encryption key management
Tri-Secret Secure
Trust Center
These features ensure comprehensive protection for your Snowflake environment.
Join our team
Partners
Subscribe to our newsletter
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.
Accelerate and automate your data projects with the phData Toolkit