Snowflake Summit 2023 was full of exciting announcements across all workloads, but AI/ML was particularly more prevalent than ever. From Generative AI and Large Language Models (LLMs) to classic regression and classification with SnowML, it’s clear that the Snowflake Data Cloud is prioritizing advanced analytical applications on Snowflake. Â
Here’s a quick rundown of everything that Data Scientists and ML Engineers should be excited about. Â
GPU Compute Options & NVIDIA Partnership
The biggest headline at Snowflake Summit 2023 was a new partnership between Snowflake and NVIDIA. This partnership will bring GPU acceleration to Snowflake Warehouses and Snowpark Container Services (see below!). GPUs are vital for Generative AI and LLM applications, as well as many deep learning applications that use unstructured data. Â
To learn more, check out our dedicated post on Snowflake and NVIDIA.
Snowpark Container Services
Snowpark Container Services is a new container service that will allow developers to run any containerized application within Snowflake. This opens the door to new architectures and programming languages running within the Snowflake security perimeter. Â
It will also bring the code to the data to reduce network overhead and improve performance. Snowpark Container Services will open up many possibilities for AI/ML, such as real-time model inference, LLM serving, parameter serving, and more! Â
If you’d like to know more, check out our post about Snowpark Container Services.
Snowpark Python Enhancements
The Snowpark Python team has clearly been hard at work over the past year because there was a whole slew of exciting new announcements for Snowpark. Here are the most exciting ones for ML and DS:
- Vectorized UDTFs and UDAFs – Snowflake has added new types of UDFs to Snowpark Python. Vectorized User Defined Table Functions (UDTFs) will allow vectorized processing of entire tables through a Pandas interface. User Defined Aggregation Functions will speed up groupby/agg operations that take multiple rows as input and return a single row as output. Â
- Python 3.9 and 3.10 – In the past, we’ve been limited to Python 3.8 for Snowpark Python. With this announcement, we’ll now be able to use Python 3.9 and 3.10 to keep modern with the latest Python language features.Â
- Unstructured Data Processing – UDFs are no longer limited to native structured data types. Users can now pass unstructured data as input to UDFs, such as images, video, audio, or custom formats. This means that Snowpark Python can now be used for all types of unstructured data.
- More Python Libraries in Anaconda – Snowflake announced a bunch of new additions to the Snowflake Anaconda channel, including Hyperopt, LangChain, and Spacy.
Snowpark ML APIs
Snowflake announced two new APIs to support the ML lifecycle:
ML Modeling API
The ML Modeling API includes interfaces for preprocessing data and training models. It is built on top of popular libraries like Scikit Learn and XGBoost, but seamlessly parallelizes data operations to run in a distributed manner on Snowpark. This means that data scientists can scale their modeling efforts beyond what they could fit in memory on a conventional compute instance.
MLOps API
The MLOps API is built to help streamline model deployments. The first release of the MLOps API includes a Model Registry to help track and version models as they are developed and promoted to production.
ML-Powered Functions
Snowflake also announced a collection of ML-Powered functions that can be executed on Snowflake’s SQL engine. These SQL functions are intended to bring the power of ML to data practitioners who might not have the expertise to train their own models. Â
The current suite of ML-Powered functions is meant to tackle common business problems. Here are the ML-Powered Functions that were discussed at Summit:
- Forecasting – The Forecast function helps create forecasts based on historical time series and automatically includes seasonality, scaling, and other trends behind the scenes. The output of the function is a prediction of the future time series for a specified duration in the future.Â
- Anomaly Detection – The Anomaly Detection function finds outliers within a time series. This can be very powerful to help replace static thresholds when searching for outliers or creating alerts.
- Contribution Explorer – The Contribution Explorer function is intended to automatically identify contributing factors behind a particular trend. It is meant to empower root-cause analysis using multivariate data.
Parting Thoughts
The sheer number of announcements and enhancements Snowflake made at Summit related to AI/ML this year is truly impressive. It is very exciting to see how quickly Snowflake is moving beyond traditional data warehousing workloads to bring ML applications to the tremendous quantities of data within Snowflake’s secure platform.Â
Some of these features are still in Public or Private Preview, but as Snowflake’s 2023 Partner of the Year, phData has had the pleasure of working with all of the new features before they’re released. If you’re curious about these new announcements or how to succeed with Snowflake, we’re happy to answer any questions!