February 28, 2025

Snowflake Git Integration – Snowpark CICD Lifecycle

By Deepa Ganiger

CI/CD (Continuous Integration and Continuous Delivery) is critical to any organization’s development lifecycle. Various tools like Flyway, Schemachange, Jenkins, and GitHub Actions are commonly used to implement CI/CD for Snowflake objects like tables, views, and SQL stored procedures. While these tools can also be leveraged for Snowpark objects, the introduction of Snowflake Git Integration simplifies the process even further, offering a more streamlined and flexible approach to managing Snowpark objects.

This blog covers:

  • Integrating Snowflake with GitHub using API integration.

  • Creating a sample Snowpark stored procedure that sources code directly from GitHub.

  • Implementing the CI/CD lifecycle for Python code in a development environment.

CICD Lifecycle With Git Integration

Generally, the Snowpark code is created with either Python or Java code inline, placed in the object store, and referred to during procedure creation. So, during deployment, tools like Jenkins, GitHub Actions, etc., should copy/upload the code to the object storage location. Integrating Git with Snowflake opens up different ways to handle the CICD life cycle during development and production.

  • A stored procedure in the production environment can point to the “main” branch of Git, while the test/dev environment can point to different branches.

  • Developers can refer to feature branches or development while making changes. 

  • Once the developer completes the change and tests in the feature branch, the code can be promoted to the test and main branch. Refreshing the repository stage will simply refresh the code in production.

Let’s look at how these can be achieved using the Git Integration feature.

Configure Snowflake Account to Integrate With GitHub

The following diagram depicts how the objects can be set up more securely and how different roles and Snowflake objects can be configured to interact with Git. This will require setting up API Integration and Snowflake Secrets to store Git credentials. Both integrations and secrets are schema-level objects that can be secured with different roles. 

The overall configuration can be designed as below.

Roles:

  • secrets_admin: Owns the secrets/credentials to GitHub

  • git_admin: Owns the API integration and repository

  • developer: Develops codes and commits into the code repository

Databases:

  • integrations_db: Database that houses secrets, integrations, and repository objects

  • dev_db: Database where application code resides.

The overall process flow for administration will be as follows.

  • The Git Administrator creates Git credentials (access tokens) and hands them to the secret administrator (secrets_admin).

  • secrets_admin will then create a secrets object within Snowflake integrations_db

  • secrets_admin will grant usage on secrets to role git_admin.

  • The user who has git_admin user will then create two objects within the integrations_db database.

    • API Integration object, which connects to GitHub using the secrets provided

    • The Git Repository object will act as an intermediate stage where Python codes from Git will reside within Snowflake. The Stored Proc objects will refer to these Python codes in the stage object.

  • Developers will need access to this repository object to refer to it in the Snowpark code.

Once this is set up, the overall administration activity is complete, and the Snowflake account will be integrated into a specific Git Repository. 

Generate a Personal Access Token in the GitHub under developer settings. This token is required for secret admins to configure the secrets. The token can be generated through the following menu item within GitHub.

Settings > Developer Settings > Personal Access Tokens > Fine-grained tokens > Generate new Token.

Provide access to specific repositories within GitHub and grant the token the following permissions.

Snowflake does allow integration to the following Git Repos, and the process of generating a token should be similar.

  • GitHub

  • GitLab

  • BitBucket

  • Azure DevOps

  • AWS CodeCommit

				
					--To be run by Secrets administrator
USE ROLE secrets_admin;
USE DATABASE admin_db;
USE SCHEMA admin_db.integrations;
CREATE OR REPLACE SECRET snow_git_secret
 TYPE = password
 USERNAME = 'snow_user' --git user id
 PASSWORD = 'github_pat_11ARY76PIcgNLDUUEZPA6ev7NU4EH'; --generated from Git Developer settings


--To be run by git administrator
USE ROLE git_admin;
USE DATABASE admin_db;
USE SCHEMA admin_db.integrations;
CREATE OR REPLACE API INTEGRATION git_api_integration
 API_PROVIDER = git_https_api
 API_ALLOWED_PREFIXES = ('https://github.com/acme')
 ALLOWED_AUTHENTICATION_SECRETS = (snow_git_secret)
 ENABLED = TRUE;
				
			

Once the Git secrets and API integration are in place, the repository should be created in Snowflake, which acts as a “stage” location for Git codes.

				
					CREATE OR REPLACE Git REPOSITORY snowflake_snowpark_utilities
 API_INTEGRATION = git_api_integration
 Git_CREDENTIALS = snow_git_secret
 ORIGIN = 'https://github.com/acme/snowflake_snowpark_utilities.git';

				
			

The developer or users who refresh the code from Git will require WRITE permission to access this repository.

Snowpark - Git Integration

Now that the Git integration is set up and the Snowflake account is configured to talk to Git, the next step will be to source the Python code directly from Git. The following is simple Python code that reads a table name and filter value as input and writes out a new table with the filter applied.

				
					import snowflake.snowpark as snowpark
from snowflake.snowpark.functions import col
from snowflake.snowpark.functions import col
from snowflake.snowpark.dataframe_reader import *
from snowflake.snowpark.functions import *


def main(session: snowpark.Session, inputTableName, filter_value, outputTableName ):
   df = session.table(inputTableName)
   df.filter(col("C_MKTSEGMENT") != filter_value)\
       .write.mode("overwrite").save_as_table(outputTableName)
   return outputTableName + " table successfully written"

				
			

Once the code is ready and tested locally, push it to the main branch of Git. The Python code can then be referenced directly, as shown below.

				
					CREATE OR REPLACE PROCEDURE demo_db.git_demo.filter_table(inputTableName VARCHAR, filter_value VARCHAR, outputTableName VARCHAR)
 RETURNS STRING
 LANGUAGE PYTHON
 RUNTIME_VERSION = '3.9'
 PACKAGES = ('snowflake-snowpark-python')
 IMPORTS = ('@admin_db.integrations.snowflake_snowpark_utilities/branches/main/common/filter_table.py')
 HANDLER = 'filter_table.main';
				
			

As shown above, the code is directly referenced in the IMPORTS section of the procedure definition. Whenever the code is updated and pushed to Git, it can be refreshed using a simple command, as shown below.

				
					ALTER Git REPOSITORY admin_db.integrations.snowflake_snowpark_utilities FETCH;

				
			

The deployment process using CICD tools like Jenkin/GitHub Actions, etc, should just execute the above command to refresh the code in Snowflake.

Limitations

Snowflake-Git integration is a powerful feature, although there are some limitations currently, such as,

  • Access to the repository is “read-only” for stored procs created using the CREATE OR REPLACE PROCEDURE syntax. Writing to Git is only supported for notebooks.

  • Access is only supported through the public internet.

  • It is currently not supported on native apps or data-sharing scenarios.

Closing

Snowflake is continuously adding features to its stack. As more features emerge, any organization must ensure a strong deployment process. At phData, our engineers have a lot of experience building applications in Snowflake and managing them efficiently using a strong CICD process to make you successful. 

If you want to streamline your Snowflake deployment and enhance your CICD process, our team at phData is here to help. 

Reach out to learn how we can support your organization in building a scalable, efficient Snowflake environment!

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit