As we continue to rely increasingly on digital solutions, it’s hard to ignore major security breaches in the news by major corporations like Equifax, Facebook, and First American Financial. Stories like these are part of the reason why security concerns always come up when moving your enterprise to a new application or software suite. Â
Regardless of the headlines, security and access control should always be front of mind. Whether that is controlling what internal and external users can access or restricting access from others, there are many things you have to consider.
For Snowflake users in particular, these considerations include:
- How do I login to the Snowflake Data Cloud using my enterprise credentials?
- How do I provision users in Snowflake?
- How do I manage user privileges in Snowflake?
- How do I deactivate users upon termination in Snowflake?
- How do I integrate my existing applications with Snowflake in a secure manner?
Though we frequently hear news stories where somebody was able to get unauthorized access to an enterprise’s data, there are safeguards you can put in place to prevent this scenario. Â
For example, automating access control allows you to reduce human error and provide the least privilege access to mitigate the extent of the damage in the event of a breach. By integrating your authentication and user management with the rest of your enterprise tooling, you can ensure that changes to your environment are reflected anywhere your data resides.
While most concepts in this blog post can be applied to any service provider, we’ll be specifically discussing the Snowflake Data Cloud integration throughout this post, taking a deep dive into the following:
- Authorization
- Authentication
- Secure authentication communication protocols
- Snowflake support for these communication protocols
- Compare and contrast protocols with regard to Snowflake
- Syncing users and user information between your identity provider and Snowflake
Terms
Before we jump into how to integrate your existing authorization and authentication workflow with a service provider like Snowflake, it’s important to have a clear understanding of the difference between authorization and authentication. Â
While these are frequently used interchangeably, they are distinct security processes within identity and access management.
What is Authentication?
Authentication defines how a user is identified and validated (by some manner of credentials). Many applications accomplish this with a login page where a user enters a username and password. Â
Once those credentials are validated, information about the user is provided to the service provider, and they can access the system.
In the case of enterprises that use services such as Microsoft Active Directory Federation Services (ADFS), credentials are used when accessing a physical device and are provided behind the scenes to the identity provider by nature of federated authentication.
What is Authorization?
Authorization is the process of giving users permission to access a resource. Â
This is typically provided by issuing an access, identity, or security token after a user has authenticated. The token proves that the user has the authority to communicate with and access resources within your resource server or enterprise.  Â
These rules are generally applied via group or user level policies and permissions.
What is Federated Authentication?
At a high level, federated authentication simply means that you’re relying on an identity provider (IdP) to authenticate your users instead of relying on Snowflake. Â
This integration is communicated between Snowflake and your IdP by nature of the Security Assertion Markup Language 2.0 (SAML) or OAuth 2.0 (OAuth) standard.Â
This communication stream allows users to authenticate and return a token back to Snowflake. This token is then used by Snowflake on each request to validate the user has the appropriate access to perform the action they submitted. Â
Chances are your enterprise uses some sort of federated authentication today; common examples of federated authentication are tools like Google, ADFS, and Okta.
It’s important to note that using federated authentication is more complicated than just allowing a user to authenticate with an identity provider. Snowflake requires that users exist within Snowflake before a user is allowed to authenticate.
Your enterprise will want to change their employees’ access to Snowflake for a variety of reasons: when employees are hired and terminated, the groups they are part of changes, or the permissions that you wish to associate to a group need to be updated.Â
Mechanisms must be in place to keep this data in sync between your identity provider and your service provider for a seamless user experience.
Authorization Protocols Overview
Security on the web takes on a lot of different forms and has many different protocols. This blog will focus on the two protocols that are supported by Snowflake: SAML and OAuth.
These protocols are used for communication between identity providers and applications such as Snowflake or tooling such as SnowSQL. While these two protocols are similar in what they accomplish, the way authentication and authorization work within them are very different.
Let’s take a deep dive into each of these protocols, how they are designed, and how your users and tooling interact with them.
SAML
The SAML open standard has been around for a long time and is very common for integrations. SAML is based on Extensible Markup Language (XML). When people refer to SAML, they are generally referring to SAML 2.0, which was approved by the OASIS Consortium in 2005. This standard improves on the SAML 1.1 specification that was approved in 2003. Â
Specific Terms
There are two providers involved with a SAML implementation: service providers and identity providers. Let’s define them.
An identity provider is responsible for managing authentication for your users. This is where user credentials are stored and validated. The result of authenticating with this service is authorization and authentication messages being sent to the service provider.
A service provider is the entity that is providing a service to the end user. In the case of Snowflake, the service provider is Snowflake. This service is responsible for starting the authentication request and receiving the response before letting the user access the service.
How does SAML work?
SAML relies on a shared configuration between the identity provider and service provider in order to communicate in a secure manner. To start the authentication process, a SAML request is made by the service provider and forwarded to the identity provider by the user’s browser. Â
This request is validated, and a response is issued by the identity provider containing user profile and group/role information. This response is then forwarded to the service provider. Â
This process is mapped out below:
SAML specifically relies on both the service provider and identity provider to have a shared configuration that is setup when integrating the two services. This includes configuring things like attributes, formatting, issuers, and an X509 Signing Certificate.
SAML Assertions
There are three types of SAML assertions: authentication, attribute, and authorization decision. Â
- Authentication assertions are used to identify a user and provide metadata about how that user authenticated.
- Attribute assertions are used to provide profile type information about a particular user.
- Authorization decision assertions are used to determine if a user is authorized to use a service.
These assertions are used throughout the user authentication lifecycle in a number of ways. When a user attempts to access your service provider (Snowflake in our case), your service provider checks to see if the user has an active session. This varies by implementation, but generally, credentials have a time-to-live (TTL), which is set at the time the credentials are issued. Â
This specifies how long the credentials are valid for, and provides a quick mechanism to know when to have a user re-enter their credentials. If the session is found to be expired or there isn’t a current session, then the user is prompted to login and start the SAML authentication flow.
OAuth 2.0
The OAuth framework was initially created and supported by Twitter, Google, and a few other companies in 2010 and subsequently underwent a substantial revision to OAuth 2.0 in 2012. Today. it is widely used by companies including Amazon, LinkedIn, Microsoft, and Netflix.
OAuth follows the same concepts as SAML with regard to having an identity provider and a service provider but differs greatly in implementation. We’ll compare the two later. First, let’s take a look at what the OAuth 2.0 authorization process looks like.
There are a lot of different resources and authorization workflows defined within the OAuth specification, each with a specific intended use case. OAuth aims to maximize security and prevent interception or man-in-the-middle attacks. While there are also different extensions of the OAuth specification, we will only be focusing on one for the purpose of this blog: OpenID Connect. Â
OpenID Connect
OpenID Connect (OIDC) is the most common protocol that extends the OAuth protocol. At the same time, the OAuth specification leaves user authentication up to the consumer and bases itself solely on user authorization.Â
The OIDC protocol adds a layer of user authentication by issuing a JSON Web Token (JWT) when a user authenticates. This adds an additional layer to the OAuth protocol, which allows for user authentication as well as user authorization.
OAuth Terms
To start, there are a few entities within the OAuth authorization process that we need to define.
The client is the user or entity that wants to access a resource. In most grant types, the client is responsible for providing credentials to the authorization server.
The resource owner is the entity that owns the resource that the client is trying to access. It’s responsible for establishing access controls with the resource server that is serving the content to the client. The resource owner does not have visibility into the client’s credentials.
The resource server is responsible for validating user authorization and applying the appropriate access controls set by the resource owner. The authentication validity can be checked by looking at claims within the JWT or by validating information provided by the authorization server. If the request is found to be valid, the server provides the resource to the client. If not, it rejects the request.
The authorization server is a third party responsible for authentication, authorization, and user management for the resource server. Clients provide credentials in a variety of forms (username/password or federated authentication to other identity providers), and those credentials are validated by the authorization server.
Grant Types
Now that we’ve defined the different entities involved in the OAuth specification, it’s important to understand grants. Grants are very structured processes by which a user or service authenticates.
Due to the nature of the web, there are varying levels of trust between your client and your resource server. This trust dictates the level of security required. For example, if a request is coming in from an unknown IP address (e.g., a user’s browser), you don’t have an inherent level of trust established, so you will need to use a more secure grant type. Â
However, if your own API was trying to access your resource server, you can use grant types that require fewer steps since you have full control of the client.
Let’s take a look at the different grant types defined by the OAuth 2 specification.
Authorization Code Grant
The Authorization Code Grant is the most secure grant type for browser and application-based authentication. In order to use this type of authentication workflow, the client must be able to interact with the resource owner’s user-agent and must be capable of receiving redirects.Â
This is the most secure workflow and should always be used when you cannot trust the client if your use case allows. Visually, it looks like the following:
As you can see, there’s many validation steps that occur between each entity in the workflow. This is to ensure that the client can be trusted and is who the resource owner and server expect them to be.
Authorization codes are one-time-use tokens that prevent intercepted data to be re-used and are required to communicate credentials to the authorization server. This adds another layer of security on top of usernames and passwords for your client.
Implicit Grant
The second most common grant type is the Implicit Grant. This flow looks similar to the Authorization Code Grant, but removes the authorization code step in the previous workflow. Previous to the authorization code grant, the implicit grant was the general recommendation.
When a user authorizes with the Implicit Grant flow, an ID or access token is placed in the URL of the response. This does have security implications though, because the URL and token could be intercepted, and the entity that intercepted them could access your resource server until the token expires.
With the Authorization Code flow, an individual could intercept the authorization code, but since it’s a one-time use for exchanging to an ID or access token, the interceptor wouldn’t be able to get your authorization credentials as easily. Contrasting that with the Implicit Grant, an individual would be able to directly intercept the ID or access token. Â
Therefore, it’s generally recommended to use the Authorization Code Grant over the Implicit Grant whenever possible.
Resource Owner Password Credentials Grant
The Resource Owner Password Credentials Grant is used when there’s a very high level of trust between the client and the resource owner. In this flow, the client is generally an API, and the user is submitting their credentials directly to your API.Â
The API then forwards those credentials via the Resource Owner Password Grant to your authorization server, which validates the request, and then the resulting credentials are returned through the client to the user. Â
Since the resource server has to receive credentials and securely pass those credentials to the authorization server, there must be a significant level of trust between the two entities. Your client (browser application and API) is also responsible for capturing the user’s credentials. Â
This exposes your enterprise to additional security risks such as javascript vulnerabilities and cross site scripting. You would need to run security checks against your code regularly to ensure there isn’t any malicious code injected into your website.
Client Credentials Grant
In all of our previous grants, our clients are generally based in the browser, where we have varying levels of trust between the client and the resource server. The Client Credentials Grant is typically used within service accounts or server-based processes where there’s absolute trust between the resource server(s) and authorization server.
The best use case for this is when APIs need to be able to communicate with other APIs. Â
This flow relies on using secrets in order to authenticate, and these secrets must be secure. If the secret was intercepted or accessed by another party, they would be able to generate tokens to access your system until those secrets were changed.
So Should I Use SAML or OAuth?
Like most things, it depends! Both SAML and OAuth are great protocols within identity management. SAML is based around user authentication and OAuth is based around authorization. Depending on your specific scenario, you may need one or both. Â
If you’re using an external identity provider such as Okta, it’s likely you have the ability to use both SAML and OAuth for a variety of applications and integrations. Specifically, with regards to Okta, it would be recommended to use OAuth because the specification has more levels of trust and security, and you can configure different authorization flows for each one of these.
If you’re also using OIDC, you have a single solution for both authentication and authorization. OIDC is becoming more and more common as large companies continue to adopt it as their integration protocol.Â
If you’re frequently integrating with legacy or on-premises systems, SAML is going to be more common, but you will need another layer for user authorization.
Integrating with Snowflake Security
Now that we’ve gone over the protocols that are supported by Snowflake and discussed when your enterprise would be better off using SAML or OAuth, let’s look into how each of these protocols integrates with Snowflake.
Federated Authentication
As previously detailed, federated authentication allows Snowflake to rely on a third-party identity provider for authentication. This allows you to integrate with existing authentication methods, separate responsibilities, and consolidate user management. Now that we know how federated authentication works for both SAML and OAuth, let’s jump into how they integrate with Snowflake.
SAML In Snowflake
For SAML-based authentication, Snowflake natively supports Okta and ADFS. This makes the setup significantly easier because the integration is fully supported by Snowflake. If your enterprise uses Okta, there are pre-built application integrations you can use that ease the implementation steps on the Okta side, and Snowflake provides a step-by-step guide to set it up.Â
Snowflake also provides a step-by step-guide for implementing federated authentication with ADFS.
If you’re using a SAML-based identity provider other than Okta or ADFS, you can find what is supported here. Once you’ve setup your identity provider to allow SAML authorization requests, you will have to configure Snowflake to perform the SAML request to your identity provider. Snowflake also provides a guide for this configuration.
There are multiple ways to use SAML based authentication within Snowflake. Snowflake provides many mechanisms to access their service including:
- Browser
- SnowSQL
- Python Connector
- JDBC/ODBC Driver
- .NET
Snowflake has some limitations with SAML. If you aren’t using Okta, you are required to use a browser (or command prompt) to complete the SAML authentication process, and then those credentials are provided to the above tooling.Â
If you are using Okta, Snowflake has the ability to natively integrate with Okta, which allows you to programmatically authenticate via Okta’s API and pass the resulting credentials to Snowflake.
Configuration
In order to use SAML within Snowflake, you will need to configure your Snowflake instance with the appropriate configurations. You will need to create a security integration with type SAML2 and provide Snowflake with the following SAML parameters:
- Type
- SSO_URL
- Label
- E509 Certificate
The SQL required by Snowflake to set the SAML identity provider looks like the following:
use role AccountAdmin;
alter account set saml_identity_provider = '{
"certificate": "XXXXXXXXXXXXXXXXXXX",
"ssoUrl": "https://abccorp.host.com/adfs/ls",
"type" : "ADFS",
"label" : "ADFSSingleSignOn"
}';
There are also a number of optional parameters that you may need to configure depending on your environment. You will also need to create this same SAML configuration within your identity provider.
OAuth In Snowflake
For OAuth-based authentication, there are a few different options. Snowflake supports three external OAuth servers, a custom integration, and two partner applications. This gives you a lot of flexibility in how you configure your Snowflake instance, and where your users are authenticating from. Â
Most commonly, you’re going to be using an external identity provider such as Okta, ADFS, or Azure Active Directory. You also have the option of using Snowflake as your identity provider, as Snowflake provides its own OAuth server!
Once you’ve set up your OAuth workflow, it’s important to note that Snowflake expects the request to have certain parameters. Namely, you will need to define the OAuth scopes that you’re authenticating a user with.Â
Scopes limit the operations and roles permitted to be returned in the user’s access token, functionally limiting the capabilities they have in Snowflake.
Snowflake provides documentation on how to perform this mapping.
The recommended practice is to return a scope mapped to the Snowflake role you want the user to have for their session (Snowflake does allow you to not specify what role to map the user’s session to, but this must be specified in the returned scope).
Within your identity provider, you will need to create an application for integration into Snowflake. This provides the authorization server a set of data that defines who and how clients can authenticate with your application.
If you’re using OIDC, you may also need to set up your public keys and claims endpoints; Okta and many other service providers provide this for you by default. You will also need to configure Snowflake to use either an internal or external OAuth security integration.
Snowflake provides OAuth support for the following integrations:
- SnowSQL
- Python
- Go
- JDBC/ODBC
- Spark Connector
Snowflake OAuth
Up until now, we’ve been talking about having a separate identity provider and resource owner/server. However, Snowflake has the ability to operate as all of the above! Snowflake has an internal OAuth setup that directly integrates with its UI and tooling for ease of access.Â
This allows you to provision your users within Okta, have users sign into Okta via the authorization code grant, and have federated authentication to other resource owners. While this is an option, generally, most enterprises will have an identity provider other than Snowflake that they’re integrating Snowflake with.
Key Pair Authentication
While a federated authentication strategy is generally the recommended approach for enterprises, Snowflake also offers a few different types of key pair authentication.
In a basic key pair authentication, users have the ability to provide their username and password directly to Snowflake for authentication. Snowflake also supports a more advanced key pair authentication strategy where users provide an RSA key pair for their account, which is then stored as a fingerprint within Snowflake. When the user authenticates, they must provide this RSA key.
Automated User and Role Management
Now that we’ve covered how to have users login via federated authentication, let’s take a look at managing users and roles between your identity provider and Snowflake.Â
Although Snowflake does support authentication federation, accounts still need to be provisioned within Snowflake (along with databases, schemas, and roles, as well as your information architecture). Snowflake provides some tooling out of the box using System for Cross-domain Identity Management (SCIM), and phData has created the Provision Tool to automate resource management within Snowflake.
SCIM
SCIM is an open source specification for facilitating automated management of users and their groups/roles using RESTful APIs. These APIs support HTTP methods such as GET and POST to access or modify user information, including creation, deletion, updating attributes, and updating groups.Â
When you integrate SCIM between your identity provider and service provider(s), you’ll be able to react to changes within your identity provider as they happen!
Snowflake provides native SCIM support for Okta and Azure Active Directory, but you can manually configure SCIM for ADFS and other providers as well. The identity provider uses a SCIM client to make the RESTful API request to the Snowflake SCIM server. Upon validating the API request, Snowflake performs actions on the user or group. For more information, you can visit Snowflake’s documentation.
Snowflake uses SCIM for the following:
- Creating and activating users in Snowflake.
- Managing groups/roles a user has access to.
- Updating user attributes.
- Deactivating users upon termination in Snowflake.
PRO TIP: When a user is removed from your existing authentication (Okta/ADFS/Azure Active Directory), the user won’t be able to authenticate via federated authentication, but it’s best practice to disable their Snowflake account as well.
In order to map groups from the identity provider to roles that exist within Snowflake, you will need to create those groups within your identity provider. These groups must have an id attribute that matches the Snowflake role id. This is how Snowflake maps the user’s groups to its roles.
An important thing to note about SCIM within Snowflake is that all changes to a user or their groups must be made within the identity provider. If you make changes to the user and their groups within Snowflake directly, those changes will not be replicated back to your identity provider.
Provision Tool
phData created the Provision Tool with the goal of automating user onboarding and information architecture. Within the Provision Tool, you can define groups and models with a template-based approach. Each member of the group has the model applied to them. This allows you to define what your user’s resources should look like and automatically generate (and execute) the Snowflake SQL necessary to create those users. Â
This can either be used to manually specify each user account in Snowflake, or you can integrate the Provision Tool with ADFS to automatically map your users and groups in ADFS to a given model.
The Provision Tool deploys your changes via engineers creating pull requests against a repository and applies those changes when the pull request is merged. This not only gives your enterprise governance and auditability for changes to users but also to your entire Snowflake environment.Â
It provides automation and governance around the following:
- Creating and activating users in Snowflake
- Managing groups/roles a user has access to
- Updating user attributes
- Deactivating users upon removal from ADFS groups in Snowflake
- Provisioning/Deprovisioning
- Warehouses
- Databases
- Schemas
- Stages
- Resource Monitors
- Users
- Roles
- Grants
- Synchronization with ADFS
- Caching performance optimizations
- Capture data drift of unmanaged resources
You can check out more on the Provision Tool here.
How Immuta Helps
One of the additional benefits of integrating Snowflake with an external authorization system is that you can leverage tools like Immuta to simplify the management of data access and security policies in Snowflake.
Immuta is a data security platform that enables organizations to manage, create, and audit Snowflake data. Once integrated with your identity management provider, Immuta automatically scans, classifies, and monitors your Snowflake data so you can easily define security policies and fine-grained access controls based on attributes and roles from your identity system.
These policies leverage native functionality in Snowflake by orchestrating the access grants and security policies you’d otherwise have to build by hand.
Immuta uses SCIM providers to update user access and permissions in near-real time, allowing you to have more data in Snowflake without the burden of managing Snowflake table and row-level policy controls.Â
Logging into Immuta, users can quickly see who is accessing data ranked by popularity, sensitivity, and potential for misuse:
Immuta uses the identity provider setup in Snowflake to locate and highlight potential risks so you can write rules to govern access and security.
Immuta’s natural language policy (NLP) builder allows your business users to define who gets access to data and what they can see when they run queries, with no technical expertise required. This means that your data engineering teams won’t be the bottleneck that limits the amount of data used to generate insights.Â
For example, to create a masking policy in Snowflake, a data engineer would need to build something like this that depends on roles existing in the identity system:
With Immuta, you can open up policy creation to various lines of business based on user attributes from the identity system, so the process is straightforward and scalable. Let’s look at it in practice.
In the example below, we are using attributes from Okta that denote a user’s department. Personally identifiable information (PII) in Snowflake will be masked for everyone except those who are in the HR department. This department is a user attribute set in Okta, not a role, meaning that anyone who gets added to or removed from the HR department will automatically have the appropriate access permissions without any coding in Snowflake.Â
The above policy is set to use classification, meaning that this one policy can apply to any table in Snowflake that is classified as an HR table.Â
Now, if a user runs a query in Snowflake, a native Snowflake masking policy has been applied without the user having to write any SQL code:
Answering Our Initial Questions
Let’s come back to our original questions we asked around authentication, user provisioning and management, and integration now that we’ve described all the options.
You will need to configure your identity provider and Snowflake to integrate with each other. You can use SAML or OAuth for this functionality, but generally, the recommendation would be to use OAuth with OIDC. The configurations, limitations, and setup steps will depend on which protocol you choose. Snowflake has a native integration with Okta, so if you have the option, use Okta for an identity provider.
This will depend on whether you’re using SCIM or not. It’s recommended to use SCIM when available as it automates the creation and management of your users and their access. If SCIM is not available, then you will need to manually create your users, grant privileges, assign roles, and manage credentials. This is a great use case for the Provision Tool, since it automates these manual tasks and provides ADFS connectivity and management of users.
This depends on whether you’re using SCIM within your Snowflake instance. It’s recommended to use SCIM when available. Â
If you’re using SCIM and managing your groups within your identity provider, you manage user privileges by assigning users to groups that map to your Snowflake roles. If you’re doing this manually, you will need to either script or manually assign roles and privileges to your users. This is another reason why we built Tram, as it auto-generates the SQL needed for these actions, and can apply it automatically.
Within Snowflake, it’s recommended and encouraged to organize a role hierarchy that encapsulates the permission(s) that a given role within your organization should have. A user should then have one or many roles assigned to them in Snowflake.
If you’re using SCIM, when a user is removed from your identity provider, the user will automatically be deactivated from your Snowflake instance as well. If you’re manually configuring your users, you will need to write and execute the appropriate SQL statements to deactivate your user and remove their permissions. This is another great use case for Tram, as it auto-generates the SQL needed for these actions, and can apply it automatically.
You will need to follow the steps for setting up either SAML or OAuth within Snowflake and your identity provider. Once these have been configured, you will be able to have federated authentication between all your applications and have single sign on functionality. If you don’t have a requirement to use SAML or OAuth specifically, default to OAuth with OIDC as it’s supported by most major websites.
Conclusion
There are a number of different ways to integrate your existing authentication strategies with Snowflake. The strategy your enterprise should choose largely depends on what your existing workflows are and what tooling you have available. Â
We recommend using OAuth 2.0 with OIDC if you have both options available. This allows for more flexibility and is easier to set up than SAML. When it comes to user management, you will need to set up SCIM for an automated workflow. If SCIM isn’t available, you should consider using a free solution like the Provision Tool to help consolidate, govern, and audit the changes to your Snowflake environment in a consistent manner.
What’s Next?
Now that you’ve worked your way through all the critical decisions that need to be made, you are ready to build out your authentication and authorization strategy for Snowflake!
Once that’s completed, we recommend designing your Snowflake role hierarchy to further improve your security by ensuring your users only have access to the information they need. And, by setting this up correctly, you can eliminate manual work, increase time-to-value, improve outcomes, and lower costs.