The Snowflake Data Cloud has a number of powerful features that empower organizations to make more data-driven decisions.
In this blog, we’re going to explore Snowflake’s Dynamic Data Masking feature in detail, including what it is, how it helps, and why it’s so important for security purposes.
What is Snowflake Dynamic Data Masking?
Snowflake Dynamic Data Masking (DDM) is a data security feature that allows you to alter sections of data (from a table or a view) to keep their anonymity using a predefined masking strategy.
Data owners can decide how much sensitive data to reveal to different data consumers or data requestors using Snowflake’s Dynamic Data Masking function, which helps prevent accidental and intentional threats. It’s a policy-based security feature that keeps the data in the database unchanged while hiding sensitive data (i.e. PII, PHI, PCI-DSS), in the query result set over specific database fields.
For example, a call center agent may be able to identify a customer by checking the final four characters of their Social Security Number (SSN) or PII field, but the entire SSN or PII field of the customer should not be shown to the call center agent (data requester).
Dynamic Data Masking (also known as on-the-fly data masking) policy can be specified to hide part of the SSN or PII field so that the call center agent (data requester) does not get access to the sensitive data. On the other hand, an appropriate data masking policy can be defined to protect SSNs or PII fields, allowing production support members to query production environments for troubleshooting without seeing any SSN or any other PII fields, and thus complying with compliance regulations.
Figure 1: Data Masking Using Masking Policy in Snowflake
The intention of Dynamic Data Masking is to protect the actual data and substitute or hide where the actual data is not required to non-privileged users without changing or altering the data at rest.
Static vs Dynamic Data Masking
There is two types of data masking: static and dynamic. By modifying data at rest, Static Data Masking (SDM) permanently replaces sensitive data. Dynamic Data Masking (DDM) strives to replace sensitive data in transit while keeping the original data at rest intact and unchanged. The unmasked data will remain visible in the actual database. DDM is primarily used to apply role-based (object-level) security for databases.
Reasons for Data Masking
Data is masked for different reasons. The main reason here is risk reduction, and according to guidelines set by the security teams, to limit the possibility of a sensitive data leak. Data is also masked for commercial reasons such as masking of financial data that should not be common knowledge, even within the organization. There is a Compliance reason and it is driven by requirements or recommendations based on specific standards, and regulations like GDPR, SOX, HIPAA, and PCI DSS.
The projects are usually initiated by data governance or compliance teams. There are requirements from the privacy office or legal team where personally identifiable information should be protected.
How Dynamic Data Masking Works in Snowflake
In Snowflake, Dynamic Data Masking is applied through masking policies. Masking policies are schema-level objects that can be applied to one or more columns in a table or a view (standard & materialized) to selectively hide and obfuscate according to the level of anonymity needed.
Once created and associated with a column, the masking policy is applied to the column at query runtime at every position where the column appears.
Figure 2: How Dynamic Data Masking Works in Snowflake
Masking Policy SQL Construct
To apply Dynamic Data Masking, the masking policy objects need to be created. Like many other securable objects in Snowflake, the masking policy is also a securable and schema level object.
The following is an example of a simple masking policy that masks the SSN number based on a user’s role.
Figure 3: Dynamic Data Masking SQL construct
-- creating a normal dynamic masking policy
create or replace masking policy mask_ssn as (ssn_txt string)
returns string ->
case
when current_role() in ('SYSADMIN')
then ssn_txt
when current_role() in ('CALL_CNETER_AGENT') then regexp_replace(ssn_txt,substring(ssn_txt,1,7),'xxx-xx-')
when current_role() in ('PROD_SUPP_MEMBER') then 'xxx-xx-xxxx'
else '***Masked***'
end;
The masking policy name, “mask_ssn” is the unique identifier within the schema and the signature for the masking policy specifies the input columns in this example “ssn_txt” alongside data type(string) to evaluate at query runtime. The return data type must match the input data type followed by the SQL expression that transforms or mask the data which is ssn_txt in this example. The SQL expression can include a built-in function or UDF or conditional expression functions (like CASE in this example).
In the above example, the SSN is partially masked if the current role of the user is CALL_CENTER_AGENT. If the user role is PROD_SUPP_MEMBER, then it replaces all the numeric characters with character x. For any other roles, it returns NULL.
Once the masking policy is created, it needs to be applied to a table or view column. This can be done during the table or view creation or using an alter statement.
Figure 4: How to apply dynamic data masking to a column
-- Customer table DDL & apply masking policy
create or replace table customer(
id number,
first_name string,
last_name string,
DoB string,
ssn string masking policy mask_ssn,
country string,
city string,
zipcode string);
-- For an existing table or view, execute the following statements:
alter table if exists customer modify column ssn set masking policy mask_ssn;
Once the masking policy is applied, and a user (with a specific role) queries the table, the user will see the query result as shown below.
Figure 5: Data Masking applied at query run time
Multiple Masking Policies Example
We can create multiple masking policies and apply them to different columns at the same time.
In the previous example, we masked the customer table’s SSN column. We can create additional masking policies for first_name, last_name, and date of birth columns and alter the customer table and apply additional masking policies.
-- masking policy to mask first name
create or replace masking policy mask_fname as (fname_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then 'xxxxxx'
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxxxx'
else NULL
end;
-- apply mask_fname masking policy to customer.first_name column
alter table if exists customer modify column first_name set masking policy mydb.myschema.mask_fname;
-- masking policy to mask last name
create or replace masking policy mydb.myschema.mask_lname as (lname_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then lname_txt
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxxxx'
else NULL
end;
-- apply mask_lname masking policy to customer.last_name column
alter table if exists mydb.myschema.customer modify column last_name set masking policy mydb.myschema.mask_lname;
-- masking policy to mask date of birth name
create or replace masking policy mydb.myschema.mask_dob as (dob_txt string) returns string ->
case
when current_role() in ('CALL_CNETER_AGENT') then regexp_replace(dob_txt,substring(dob_txt,1,8),'xxxx-xx-')
when current_role() in ('PROD_SUPP_MEMBER') then 'xxxx-xx-xx'
else NULL
end;
-- apply mask_dob masking policy to customer.dob column
alter table if exists mydb.myschema.customer modify column dob set masking policy mydb.myschema.mask_dob;
Once these masking policies are created & applied, and a user (with a specific role) queries the table, the user will see the query result as shown below.
Figure 6: Multiple Data Masking Policies Applied Example
Dynamic Masking & Run Time Query Execution
The best aspect of Snowflake’s data masking strategy is that end users can query the data without knowing whether or not the column has a masking policy. Whenever Snowflake discovers a column with a masking policy associated, the Snowflake query engine transparently rewrites the query at runtime.
For authorized users, query results return sensitive data in plain text, whereas sensitive data is masked, partially masked, or fully masked for unauthorized users.
If we take our customer data set where masking policies are applied on different columns, a query submitted by a user and the query executed after Snowflake rewrites the query automatically looks as follows.
Query Type | Query Submitted By User | Rewritten Query by Snowflake |
---|---|---|
Simple Query | Select dob, ssn from customer; | Select mask_dob(dob), mask_ssn(ssn) from customer; |
Query with where clause predicate | Select dob, ssn from customer where ssn = ‘576-77-4356’ | Select dob, ssn from customer where mask_ssn(ssn) = ‘576-77-4356’ |
Query with joining column & where clause predicate | Select first_name, count(1) from customer c join orders o on c.first_name = o.first_name Group by c.first_name; | Select mask_fname(first_name), count(1) from customer c join orders o on mask_fname(c.first_name) = o.first_name Group by mask_fname(c.first_name); |
The rewrite is performed in all places where the protected column is present in the query, such as in “projections”, “where” clauses, “join” predicates, “group by” statements, or “order by” statements.
Conditional Masking Policy in Snowflake
There are cases where data masking on a particular field depends on other column values besides user roles. To handle such a scenario, Snowflake supports conditional masking policy, and to enable this feature, additional input parameters can be passed as an argument along with data type.
Let’s say a user has opted to show his/her educational detail publicly but this flag is false for many other users. In such a case, the user’s education detail will be masked only if the public visibility flag is false, else this field will not be masked.
Figure 7: Conditional Data Masking Policies SQL Construct
-- DDL for user table
create or replace table user
(
id number,
first_name string,
last_name string,
DoB string,
highest_degree string,
visibility boolean,
city string,
zipcode string
);
-- User table sample dataset
insert into user values
(100,'Francis','Rodriquez','1988-01-27','Graduation',true,'Atlanta',30301),
( 101,'Abigail','Nash','1978-09-18', 'Post Graduation',false,'Denver',80201),
( 102,'Kasper','Short','1996-07-29', 'None',false,'Phoenix',85001);
– create conditional masking policy using visibility field
create or replace masking policy mask_degree as (degree_txt string,visibility boolean) returns string ->
case
when visibility = true then degree_txt
else '***Masked***'
end;
– apply masking policy
alter table if exists mydb.myschema.user modify column highest_degree set masking policy mydb.myschema.mask_degree using (highest_degree,visibility);
Once this conditional masking policy is created & applied, and a user (with a specific role) queries the table, the user will see the query result as shown below.
Figure 8: Conditional Data Masking Example
What Are The Benefits to Dynamic Data Masking?
- A new masking policy can be created quickly and easily with no overhead of historic loading of data.
- You can write a policy once and have it apply to thousands of columns across databases and schemas.
- Masking policies are easy to manage and support centralized and decentralized administration models.
- Easily mask data before sharing.
- Easily change masking policy content without having to reapply the masking policy to thousands of columns.
Points to Remember When Working With Data Masking
- Masking policies carry over to cloned objects.
- Masking policies cannot be applied to virtual columns (external table). If you need to apply a data masking policy to a virtual column, you can create a view on the virtual columns, and apply policies to the view columns.
- Since all columns of an external table are virtual except the VALUE variant column, you can apply a data masking policy only to the VALUE column.
- Materialized views can’t be created on table columns with masking policies applied. However, you can apply masking policies to materialized view columns as long as there’s no masking policy on the underlying columns.
- The Result Set cache isn’t used for queries that contain columns with masking policies.
- A data sharing provider cannot create a masking policy in a reader account.
- A data sharing consumer cannot apply a masking policy to a shared database or table
- Future granting of masking policy permissions is not supported.
- To delete a database or schema, the masking policy and its mapping must be self-contained within the database or schema.
Conclusion
Snowflake’s Dynamic Data Masking is a very powerful feature that allows you to bring all kinds of sensitive data into your data platform and manage it at scale.
Snowflake’s policy-based approach, along with role-based access control (RBAC), allows you to prevent sensitive data from being viewed by table/view owners and users with privileged responsibilities.
If you’re looking to take advantage of Snowflake’s Dynamic Data Masking feature, the data experts at phData would love to help make this a reality. Feel free to reach out today for more information.