January 15, 2025

What is Column-Level Security in Snowflake?

By Pratik Datta

“Is our data safe in the cloud?”

Many of us have either pondered this question or encountered it in meetings. The Snowflake AI Data Cloud prioritizes data protection and offers a comprehensive suite of features designed to safeguard data against unauthorized access. 

These features can be tailored to meet your organization’s specific needs. One such feature is Column-Level Security, which allows you to control access to sensitive data in your tables and views, ensuring that only authorized users can access specific columns.

In this blog, we will discuss:

  • What column-level security is in Snowflake 

  • Basic and advanced methods for implementing column-level security via Dynamic Data Masking and External Tokenization

  • Maintaining (managing and tracking) column-level security policies

  • Best practices for implementing column-level security in Snowflake

What is Column-Level Security in Snowflake?

Column-level security in Snowflake involves applying a masking policy to any column within a table or view. This policy hides the column’s contents from unauthorized users while allowing authorized users to access the sensitive data during query runtime. These masking policies are schema-level objects designed to protect sensitive information without altering the actual data in the tables. 

When a user attempts to retrieve data, the masking policy evaluates whether the user has the necessary permissions to view the sensitive information, ensuring that only authorized users can see it.

The policy will contain rules, generally defined via a CASE construct (IFTHENELSE ), to mask the data fully/partially for most users while keeping it as is for certain higher privileged users who need access for regulatory or operational usage. 

By using Role-Based Access Control (RBAC), these masking policies allow users to govern the retrieval of sensitive data in Snowflake. The diagram below shows how access to sensitive data can be governed using masking policies. 

Sensitive data can be PII (Personally Identifiable Information) fields like phone number, email, SSN, etc., or PHI (Protected Health Information) fields like medical history, test results, fingerprints, Faceprint, etc, or Financial Information like credit card numbers, bank account numbers, etc. 

With Snowflake, organizations can determine which fields need protection from unauthorized access and enforce this using column-level security.

There are two major types of data masking policies currently used in Snowflake, External Tokenization and Dynamic Data Masking. We will discuss them in detail in the sections below.

External Tokenization

External Tokenization allows accounts to replace sensitive data with indecipherable tokens before loading it into Snowflake and to reverse the process (detokenize) during query execution. External Tokenization leverages masking policies in combination with external functions.

So basically, we have to call our external function in our masking policy definition.
This is particularly useful for organizations already having PII data encrypted by a passkey in other data systems like legacy databases and object stores like AWS S3. 

In that scenario, the encryption and decryption code will reside outside Snowflake, for example, in an AWS Lambda. We can then call the decryption logic in our masking policy accordingly. This also allows using the same code for the encryption of new PII data in Snowflake and prevents code redundancy.

However, with the popularity of Snowpark, many organizations can decide to migrate the tokenization code to Snowflake itself and do the PII data masking using the Snowpark functions instead of using External Functions

Irrespective of that, External Tokenization has provided organizations with an option to centralize their data governance process.

External vendors provide tokenization methods like ALTR, Baffle, etc. However, AWS Lambda, GCP Function, and Azure Functions allow us to write our custom tokenization code and use it in Snowflake. We will discuss how we can use AWS Lambda as External Functions to implement External Tokenization in Snowflake.

How to Implement External Tokenization in Snowflake

Below is the Python code in the AWS Lambda function used for tokenization (string-tokenizer): 

				
					import json
from cryptography.fernet import Fernet
import pandas as pd
import base64
import hashlib
import os

class PIITokenizer:
    def __init__(self, secret_key: str):
        """
        Initialize the tokenizer with a secret key.
        :param secret_key: A string used to derive an encryption key.
        """
        # Derive a 32-byte encryption key from the secret key
        key = hashlib.sha256(secret_key.encode()).digest()
        self.key = base64.urlsafe_b64encode(key[:32])
        self.fernet = Fernet(self.key)

    def tokenize(self, input_json: str) -> str:
        """
        Tokenize sensitive PII data.
        :param data: The sensitive data to be tokenized.
        :return: The tokenized data.
        """
        # Parse the input JSON string into a Python dictionary
        input_data = json.loads(input_json)
       
        # Convert the data into a DataFrame
        df = pd.DataFrame(input_data["data"], columns=["row_number", "value", "action"])

        # Check if the DataFrame is empty
        if df.empty:
            return json.dumps({"data": None})

        # Apply the specified action (encrypt or decrypt) to the "value" column
        df["value"] = df.apply(
            lambda row: (
                ""
                if row["value"] == ""
                else None
                if row["value"] is None
                else self.fernet.encrypt(row["value"].encode()).decode()
                if row["action"] == "encrypt"
                else self.fernet.decrypt(row["value"].encode()).decode()
                if row["action"] == "decrypt"
                else "Invalid operation Type"
            ),
            axis=1
        )
   
        # Convert the processed DataFrame back to the required JSON format
        output_data = {"data": df[["row_number", "value"]].values.tolist()}
        return json.dumps(output_data)
   
def lambda_handler(event, context):
    # 200 is the HTTP status code for "ok".
    status_code = 200
    # The return value will contain an array of arrays (one inner array per input row).
    array_of_rows_to_return = [ ]
    try:
        # From the input parameter named "event", get the body, which contains
        # the input rows.
        secret_key = os.environ.get('secret_key')#The secret key is stored as an env variable.
        tokenizer = PIITokenizer(secret_key)
        event_body = event["body"]
        json_compatible_string_to_return = tokenizer.tokenize(event_body)
    except Exception as err:
        # 400 implies some type of error.
        status_code = 400
        # Tell the caller what this function could not handle.
        raise err
    # Return the return value and HTTP status code.
    return {
        'statusCode': status_code,
        'body': json_compatible_string_to_return
    }
				
			

Note: Lambda requires two layers provided by AWS: AWSLambdaPowertoolsPythonV3-python310-x86_64 and AWSSDKPandas-Python310.

To use this Lambda as an external function, we will have to: 

  1. Create an API Proxy Gateway so that the lambda is exposed to Snowflake securely. To set that up, please follow these instructions.

  2. Create an API Integration in Snowflake for the API gateway created above. Below is the code to do that: 

				
					
create or replace api integration external_tokenization_lambda_apigateway_integration
    api_provider=aws_api_gateway
    api_aws_role_arn='arn:aws:iam::**************:role/snowflake-external-function-role' -- replace this by the IAM role created for Snowflake in Step 1
    api_allowed_prefixes=('https://*********.execute-api.us-west-2.amazonaws.com/snowflake-external-function-api-stage/' --replace this by the invocation URL of the API gateway stage created in Step 1
    )
    enabled=true;
				
			
  1. Do a DESC on the API integration created above to get the IAM_USER_ARN and EXTERNAL_ID as shown below.

  1. Edit the Trust Relationships of the AWS IAM role created for Snowflake in Step 1 to update the IAM_USER_ARN  and EXTERNAL_ID  we got in Step 3 (as shown below).

  1. Create the external function on the API integration and the API paths created in Steps 1 and 2 (as shown below).  

				
					
CREATE EXTERNAL FUNCTION string_tokenizer( value VARCHAR,operation_type VARCHAR)
    RETURNS VARCHAR
    API_INTEGRATION = external_tokenization_lambda_apigateway_integration
    AS 'https://*********.execute-api.us-west-2.amazonaws.com/snowflake-external-function-api-stage/StringTokenize'-- replace this by the API Gateway resource POST method  invocation URL
    ;

				
			
  1. Once the external function is created, we need to create a role and user to govern the masking process. Ideally, only the data steward/data governance lead/ security officer should be granted this role access to create masking policies. We can do this by the below set of commands:

				
					--create a new role to be granted to data steward/data governance lead /security officer
use role useradmin;
CREATE ROLE masking_admin;

--assign that role to create masking policies in the schema
use role securityadmin;
GRANT USAGE ON DATABASE <db_name> TO ROLE masking_admin;
GRANT USAGE ON SCHEMA <db_name.schema_name> TO ROLE masking_admin;
GRANT USAGE ON WAREHOUSE <warehouse_name> TO ROLE masking_admin;
GRANT CREATE MASKING POLICY on SCHEMA <db_name.schema_name> to ROLE masking_admin;

--assign the role to data steward/data governance lead /security officer
use role useradmin;
grant role masking_admin to user dummy;-- replace dummy with the user id of the data steward/data governance lead /security officer

--assign the masking admin role role privilege to use the external function
use role <function owner /SYSADMIN>;
GRANT USAGE ON FUNCTION string_tokenizer(VARCHAR,VARCHAR) TO ROLE MASKING_ADMIN;
				
			
  1. Now, we will create the masking policy using the external function. This can be done by the following set of commands: 

				
					
USE ROLE masking_admin;
-- create masking policy to grant PII_ANALYST users to see the decrypted data and all other users to see the encrypted data
create or replace masking policy <db_name.schema_name>.STRING_DECRYPT as (val string) returns string ->
  case
    when current_role() in ('PII_ANALYST') then string_tokenizer( val,'decrypt') 
    else val
  end;
				
			
  1. Once the masking policy is created, we will assign that masking policy to the tables containing the tokenized data. For this example, we will use a table named CUST_DATA which has the tokenized customer email address, SSN. We will apply the masking policy as shown below.

				
					--use the role which owns the object
USE ROLE <table owner/SYSADMIN>;
alter table if exists CUST_DATA modify column CUST_EMAIL set masking policy <db_name.schema_name>.STRING_DECRYPT;
alter table if exists CUST_DATA modify column CUST_SSN set masking policy <db_name.schema_name>.STRING_DECRYPT;


				
			

Now if we query the CUST_DATA  table using the owner role or any other role that has SELECT access on it, we’ll notice the tokenized form of the data (shown below).  

But, when we query the table using the PII_ANALYST role, we see the decrypted original data.

Note: As a pre-requisite, The PII_ANALYST role must have SELECT access on the table and USAGE access on the DB and Schema.

This is how we can implement Column-Level Security on PII fields in Snowflake using Snowflake RBAC (role-based Access Control), AWS Lambda, Snowflake External Functions, and Masking Policies.

Dynamic Data Masking

Dynamic Data Masking uses Snowflake built-in functions to mask sensitive data during query runtime. The schema-level masking policies are added to the columns in tables and views, which contain the functional logic to mask the data.

During query execution, the masking policy is applied to the column wherever it appears, determining whether Snowflake users encounter a plain-text value, a partially masked value, or a fully masked value based on the policy rules, SQL execution context, and role hierarchy

How to Implement Dynamic Data Masking in Snowflake

We will implement Dynamic Data Masking on EMAIL and SSN columns in a table called CUST_INFO for this demo. We will use the same roles we used in the implementation example for External Tokenization in the section above.

  1. While creating the masking policy on the email field, we will do partial masking i.e. retain the domains but mask the user names in the email, and for the SSN field, we will do full masking. Below is the set of SQL statements required for this:

				
					-- Create the masking policies
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY <db_name>.<schema_name>.email_mask AS (val string) RETURNS string ->
  CASE
    WHEN CURRENT_ROLE() IN ('PII_ANALYST') THEN val
    ELSE REGEXP_REPLACE(SPLIT_PART(val,'@',1), '([a-zA-Z0-9])', '*')||'@'||SPLIT_PART(val,'@',2) -- mask the user names but retain the domain value and punctuations
  END;

  CREATE OR REPLACE MASKING POLICY <db_name>.<schema_name>.ssn_mask AS (val string) RETURNS string ->
  CASE
    WHEN CURRENT_ROLE() IN ('PII_ANALYST') THEN val 
    ELSE REPEAT('*', LENGTH(val)) -- mask fully
  END;
				
			
  1. Then we will apply the masking policies to the respective fields as shown below –

				
					
-- Apply the Masking Policies
alter table if exists <db_name>.<schema_name>.CUST_INFO modify column CUST_EMAIL set masking policy email_mask;
alter table if exists <db_name>.<schema_name>.CUST_INFO modify column CUST_SSN set masking policy ssn_mask;


				
			

Now, If we query using any role other than PII_ANALYST, below is how we would see the data:

However, if we use the PII_ANALYST role to query the data, we will see the original data as shown below:

Advanced Column-Level Implementation

1. Using Tokenization Function to Encrypt Inside Snowflake

If we want to use the same tokenization external function to encrypt PII data inside Snowflake, we can first create a masking policy for encryption as shown below and then follow the above Step 8 to assign this new masking policy to the required tables and columns.

				
					USE ROLE masking_admin;
--PII Analyst sees the data as is after encryption but others will see the masked value
create or replace  masking policy <db_name>.<schema_name>.STRING_ENCRYPT as (val string) returns string ->
  case
    when current_role() in ('PII_ANALYST') then val else string_tokenizer( val,'encrypt') 
  end;
				
			

2. Tag-Based Masking Policy

A tag-based masking policy integrates the object tagging and masking policy features, enabling the application of a masking policy to a tag using the ALTER TAG command. When the data type specified in the masking policy signature matches the data type of a column, the tagged column is automatically safeguarded according to the conditions defined in the masking policy. These masking policy conditions can be designed to secure column data based on the assigned tag name or tag value. They can also ease the management of multiple masking policies in a single policy.

We will use the scenario from above to demonstrate this. 

  1. First, we will create a tag in which we will allow some specific values that correspond to our sensitive data attribute types and then assign those tags with the respective values to the EMAIL and SSN columns of the CUST_INFO table.

				
					--CREATE TAG with some allowed values
CREATE OR REPLACE  TAG <db_name>.<schema_name>.SENSITIVE_INFO 
ALLOWED_VALUES 'EMAIL','SSN', 'NAME', 'PHONE_NUMBER', 'BANK_ACCOUNT_NO' ,'ADDRESS' ;

--Assign Tag to required columns
ALTER TABLE <db_name>.<schema_name>.CUST_INFO MODIFY COLUMN CUST_EMAIL SET TAG SENSITIVE_INFO = 'EMAIL' ;
ALTER TABLE <db_name>.<schema_name>.CUST_INFO MODIFY COLUMN CUST_SSN SET TAG SENSITIVE_INFO = 'SSN' ;
				
			

If we have other tables and columns with customer addresses, names, or bank account numbers, we can similarly tag those columns using the ALTER TAG command and the corresponding values.

  1. Now, we will create the masking policy in which we will check the value of the tag of the column using SYSTEM$GET_TAG_ON_CURRENT_COLUMN and design our masking strategy based on it. For email and address, we used partial masking and full masking for the rest of the fields. Here’s the SQL to do this:

				
					--Create Masking Policy
USE ROLE masking_admin;
CREATE OR REPLACE MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN CURRENT_ROLE()  IN ('PII_ANALYST') THEN VAL ELSE
    CASE
    WHEN  SYSTEM$GET_TAG_ON_CURRENT_COLUMN('<db_name>.<schema_name>.SENSITIVE_INFO') in ('EMAIL')
    THEN REGEXP_REPLACE(SPLIT_PART(val,'@',1), '([a-zA-Z0-9])', '*')||'@'||SPLIT_PART(val,'@',2) -- mask the user names but retain the domain value and punctuations
    
    WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('<db_name>.<schema_name>.SENSITIVE_INFO') in ('SSN','NAME','PHONE_NUMBER', 'BANK_ACCOUNT_NO')
    THEN REPEAT('*', LENGTH(val)) -- mask fully 
    
    WHEN SYSTEM$GET_TAG_ON_CURRENT_COLUMN('<db_name>.<schema_name>.SENSITIVE_INFO') in ('ADDRESS')
    THEN REPEAT('*', CEIL(LENGTH(val) / 2))||SUBSTR(val, CEIL(LENGTH(val) / 2) + 1) -- mask half the address
    ELSE VAL
    END
END
;
				
			
  1. Once the masking policy is created, all we need to do is assign the masking policy to the tag (see below).

				
					ALTER TAG <db_name>.<schema_name>.SENSITIVE_INFO  SET MASKING POLICY <db_name>.<schema_name>.SENSTIVE_INFO_MASK;
				
			

Hence, this enables ease of use, allowing us to manage multiple sensitive fields with a single policy. It also allows scaling, whereby we can add the tag to any future tables/columns having sensitive data, and the data will get masked from the creation point itself.

3. Role Hierarchy and Context Functions

Since the Snowflake RBAC allows roles to be granted to other roles and thereby form a hierarchy, we might need to design our Masking Policies so that the roles that inherit the privileges of a role that has PII access also have PII access.

  1. For example, in our use case, the PII_ANALYST role is granted to the SYSADMIN. We would want any user with SYSADMIN role to have the same or more privilege than the PII_ANALYSTS role. So using CURRENT_ROLE() context function in the masking policy will defer the parent role of PII_ANALYST for PII access.

    We can prevent this by using the context function IS_ROLE_IN_SESSION(<Role Name>) as shown below.

				
					USE ROLE masking_admin;
CREATE OR REPLACE  MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN IS_ROLE_IN_SESSION('PII_ANALYST') THEN VAL ELSE
<masking logic>
END
				
			

Now if we query the table CUST_INFO using SYSADMIN role, we will see the original data as it inherits the PII_ANALYST role as shown below. 

However, any other roles (Eg. DATA_ENGINEER) which is not in role hierarchy of PII_ ANALYST role will not have the PII access privilege as shown below –

  1. Another scenario will arise when we are accessing objects that are not owned by the role we are using.

    For example, in our use case, let’s consider that the PII_ANALYST has created a view with PII fields like EMAIL and SSN, etc. When other users query that, they ideally should not have access to the PII data unless otherwise specified. To implement this, we can use the INVOKER_ROLE() context function, as shown below. 

				
					USE ROLE masking_admin;
CREATE OR REPLACE  MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN INVOKER_ROLE in ('PII_ANALYST') THEN VAL ELSE
<masking logic>
END
				
			

However, if the user role inherits the privilege of the PII_ANALYST role that had created the view, then the user role may need to have the PII access privilege too. In such a case, we can use the IS_GRANTED_TO_INVOKER_ROLE(<role name>)  context function, as shown below, to ensure the roles in the hierarchy of the view owner have the same PII privilege as the PII_ANALYST role.

				
					USE ROLE masking_admin;
CREATE OR REPLACE  MASKING POLICY SENSTIVE_INFO_MASK AS (VAL VARCHAR) RETURNS VARCHAR ->
CASE WHEN IS_GRANTED_TO_INVOKER_ROLE('PII_ANALYST') THEN VAL ELSE
<masking logic>
END
				
			

Important Considerations

  1. A masking policy must have the same data type for input and output i.e. we cannot define the input datatype as a number and return a string.

  1. Even after defining the masking policy with the same input and output data type, if we assign that policy to a column with a different datatype, i.e., our masking policy is string-based, and we assign it to a NUMBER column, then the query will ignore the masking policy and return the data as-is for that field.

  1. Before dropping the masking policy, you have to unset the Policy from the column or tag using ALTER TABLE or  ALTER VIEW or ALTER TAG.

  1. External Tokenization on a table with a huge volume of data might cause performance issues.

  2. When inserting data from a source column with a masking policy into a target column without one, users with privileges on the target column may see unmasked data if their role allows viewing unmasked values in the source. Conversely, if a user sees masked data in the source column and inserts it into the target column, the data remains masked, potentially leading to a mix of masked and unmasked values in the target column.

Best Practices

  1. To ensure consistent data security, carefully apply masking policies, review queries involving masked columns before sharing tables or views, and identify all potential target columns where data from the source column might appear.

  1. Do not use NULL as the masking character as that would result in COUNT or other aggregation queries running through a user/role which/who will see the masked data returning incorrect results. Setting the masked character as a value allows users/roles to run certain aggregate queries (e.g. distinct count) with correct output and without the need for the PII data to be visible.

  1. As provided by Snowflake documentation, the following table summarizes some of the considerations for the data governance strategies for masking sensitive data.

Policy Action

Centralized Management

Hybrid Management

Decentralized Management

Create policies

Security Officer

Security Officer

Individual Teams

Apply policies to columns

Security Officer

Individual Teams

Individual Teams

Please note that your organization will need to decide on a governance structure for who creates and manages the masking policies and who assigns them to the necessary sensitive fields in the objects. 

Closing

Column-level security in Snowflake empowers organizations to protect sensitive data with precision, ensuring compliance and reducing risks. By leveraging features like masking policies and role-based access control, you can confidently share data while maintaining granular security. 

As you implement these practices, remember to regularly review and refine your security configurations to align with evolving needs and compliance requirements. 

If you’re looking to beef up security with your Snowflake account, we recommend attending one of our free Snowflake Security Assessments that will help your organization further identify vulnerabilities and offer best practices for safeguarding your data for Snowflake.

FAQs

Row-Level Security (RLS) in Snowflake is a mechanism to control access to specific rows in tables or views based on dynamic conditions defined in something called Row Access policies. These schema-level objects evaluate user roles and other criteria at query runtime to enforce data access rules for operations like SELECT, UPDATE, DELETE, and MERGE. This allows centralized governance, enabling fine-grained access control to protect sensitive data.

Snowflake offers advanced, industry-leading security features to safeguard your account, users, and stored data. The following topics, primarily designed for administrators (such as users with ACCOUNTADMIN, SYSADMIN, or SECURITYADMIN roles), highlight key security measures:

  • Federated Authentication and SSO

  • Key-pair authentication and rotation

  • Multi-factor authentication (MFA)

  • Snowflake OAuth and External OAuth

  • Authentication policies

  • Network traffic control through network policies and rules

  • AWS VPC interface endpoints for internal stages

  • Azure private endpoints for internal stages

  • Integration with AWS PrivateLink, Azure Private Link, and Google Cloud Private Service Connect

  • Snowflake Sessions and Session Policies

  • Account level Parameters Optimization

  • SCIM (System for Cross-domain Identity Management)

  • Role-based access control (RBAC)

  • End-to-end encryption

  • Encryption key management

  • Tri-Secret Secure

  • Trust Center

These features ensure comprehensive protection for your Snowflake environment.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit