Imagine this scenario: you’ve built and trained your machine learning models on Snowpark and have stored your forecasting results in the Snowflake Data Cloud. Your business stakeholders want to know how well your ML models are performing, but they may not have the technical expertise to query the data themselves.
What can you do? Great question!
Follow along, and you’ll see that with just over 100 lines of Python code and a dozen calls to the Streamlit library, it is possible to create an interactive, data-driven ML monitoring web application to visualize and monitor your ML models’ performances for your business stakeholders to view, powered by Snowflake.
This post is a continuation of a previous post, CPG forecasting on Snowflake with Snowpark, as we’ll be using the data from its forecasting results stored in Snowflake.
What is Streamlit?
Streamlit is an open-source app development Python framework that was created specifically to make web development simple for machine learning engineers and data scientists. With Streamlit, it’s possible to create web tools in Python following the same script writing flow that a Python developer is used to.
Using Streamlit is a great way for data scientists to create rich visuals using Python, as it does not require in-depth knowledge of HTML/CSS or Javascript frameworks including React and Vue, with event handlers and callbacks.
Why Use with Snowflake?
By using Snowflake as a data source for your Streamlit application, not only can you create an end-to-end machine learning solution in Snowflake’s environment, but you also inherit the data security and query performance that Snowflake provides.
It’s worth mentioning that Snowflake acquired Streamlit in March 2022. When writing this post, it is quite easy to connect to Snowflake in your Streamlit application. However, coming to a release near you, there will be the ability to create Streamlit objects directly inside Snowflake.
With that, let’s begin!
Step 1: Setup and Configuration
The required libraries for this demo, that you can pip install into your environment are:
- streamlit==1.15.1
- snowflake-connector-python[pandas]==2.7.12
- plotly==5.11.0
- cryptography==36.0.2 (necessary if snowflake account requires key-pair authentication)
With all these packages in place, create a main.py file and import the following packages into your file:
import streamlit as st
import plotly.express as px
import snowflake.connector as sf
import os
from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import serialization
Step 2: Hello World
Our “hello world” introduction to Streamlit will be to display a title on our web page. This can easily be accomplished by first setting the layout of our web app to wide (optional) and then calling the API st.title().
# set_page_config needs to be the first Streamlit command in your script
st.set_page_config(layout="wide")
st.title("CPG Forecasting Models Monitoring")
Next, open a terminal window, navigate to your project folder, and type: streamlit run main.py. This will start up a local Streamlit server that will open your web app on your default browser on localhost:8501.
Step 3: Connect and Query From Snowflake
To connect to Snowflake with key-pair authentication, we first pass in the path to our private key for our script to decrypt. Then we call the snowflake_connector connect method with the provided parameters.
The use of the @st.experimental_singleton() function decorator is so our Snowflake connection is cached after the first run.
Make sure to replace all arrow-bracketed variables with your respective Snowflake variable values.
If your Snowflake account does not require key-pair authentication, you only need the init_connection() function portion of the code. Comment out private_key=pkb and uncomment password=<PASSWORD>.
with open("", "rb") as key:
p_key = serialization.load_pem_private_key(
key.read(),
# Assumes env variable for private key passphrase exists
password=os.environ["SNOWSQL_PRIVATE_KEY_PASSPHRASE"].encode(),
backend=default_backend()
)
pkb = p_key.private_bytes(
encoding=serialization.Encoding.DER,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption()
)
@st.experimental_singleton()
def init_connection():
return sf.connect(
user=,
#password=,
account=,
private_key=pkb,
warehouse=,
database=,
schema=,
role="
)
# connecting to Snowflake
conn = init_connection()
To query from Snowflake in Python, first create a cursor from the connection object and then pass in a SQL query. A cursor has a fetch_pandas_all() method that returns the output of the query as a dataframe.
Notice that we are now using the @st.experimental_memo() function decorator for our data load method to store our dataframe on local disk. We used the @st.experimental_singleton() decorator previously for our Snowflake connection because it is designed to store non-data objects, whereas @st.experimental_memo() is designed to store data. Without this decorator, any update to our Streamlit app, such as input from widgets, will re-run the query in Snowflake.
query = """
SELECT "sales_date",
"store",
"item",
"sales",
"sales_forecast",
"abs_error"
FROM cpg_forecasts_demo
WHERE "subset" = 'test';
"""
@st.experimental_memo()
def load_data():
cur = conn.cursor().execute(query)
return cur.fetch_pandas_all()
df = load_data()
This dataset has 10 stores and 50 items per store. Each store-item combo has actual sales and forecasted sales value for 90 days, from our forecasting results in Snowpark.
Step 4: Create your First Graph with Streamlit
Our first graph is going to be a line chart with the sales date on our x-axis and sales on our y-axis, with one line for actual sales and another line for forecasted sales.
There is an out-of-the-box Streamlit line chart API, st.line(), however, it does not have many customization options. For better customization, two of the options include creating a graph with Altair or Plotly and passing the graph object to Streamlit. Both Altair and Plotly are interactive visualization Python libraries built on top of Javascript libraries (Vega.js and D3.js, respectively).
I chose Plotly for this post, even though st.line() is built on top of Altair, because Plotly’s syntax may be more familiar to Python developers who have experience with Seaborn or Matplotlib.
agg_df = df.groupby(['sales_date'], as_index=False).sum()
# create a Plotly line plot
line_fig = px.line(
agg_df,
x="sales_date",
y=["sales", "sales_forecast"],
title="Actual Sales vs Forecasted Sales",
labels={
"sales_date": "Sales Date",
"value": "Sales",
"variable": "Legend"
}
)
# Plotly graph configs
legend_names = {"sales": "Actual Sales", "sales_forecast": "Forecasted Sales"}
line_fig.for_each_trace(lambda t: t.update(name=legend_names[t.name],
legendgroup=legend_names[t.name]))
line_fig.update_layout(
xaxis=dict(showgrid=False),
legend=dict(
yanchor="top",
y=0.99,
xanchor="right",
x=0.99
),
title_x=0.5,
height=600
)
# passing in the Plotly graph object to Streamlit
st.plotly_chart(line_fig, use_container_width=True)
By simply passing in our Plotly graph object to the Streamlit call st.ploty_chart, an interactive line chart appears on our web app.
How easy was that?!
Step 5: Add Multi-Select Widgets to a Sidebar
A graph showing the aggregated sales and forecasts for all stores and all items is a decent visualization for a high-level overview. For more granular views, we will use the Streamlit sidebar and multi-select widget to allow users to drill down to the model performance of a specific store and/or item.
Calling the method st.sidebar automatically produces a collapsible sidebar on your web page. Adding elements to the sidebar can be achieved with object notation by appending the desired Streamlit method to the end of st.sidebar.
# STREAMLIT SIDEBAR
st.sidebar.markdown('### Use store and item filters to drilldown on different granularities of model performance 🔬')
# multiselect streamlit calls
stores = sorted(df['store'].unique().tolist())
items = sorted(df['item'].unique().tolist())
store_selection = st.sidebar.multiselect(
'Store:',
stores,
default=stores
)
item_selection = st.sidebar.multiselect(
'Item:',
items,
default=items
)
We set the multi-select Streamlit object to store_selection and item_selection variables that we will use to filter our dataframe.
The code snippet for the sidebar contents should appear before the code snippet of our graph. This is because Streamlit executes the Python script from top to bottom, and the dataframe used by the graph should honor the filters from the multi-select widgets.
Notice how agg_df previously referenced the dataframe df, and is now referencing filtered_df.
# create a mask from the multiselect options to filter df by
mask = (df['store'].isin(store_selection)) & (df['item'].isin(item_selection))
filtered_df = df[mask]
# agg_df = df.groupby(['sales_date'], as_index=False).sum()
agg_df = filtered_df.groupby(['sales_date'], as_index=False).sum()
With the sidebar widgets and filtering logic in place, your web app should look something like the image below. All the stores and items are selected as default, but users can now deselect certain stores and items and the graph will update automatically.
Step 6: Add Model Metrics
For more insights into model performance, we will include forecasting model metrics above the graph. For this example, we will display values for:
- Actual Volume: the sum of sales
- Forecasted Volume: the sum of forecasted sales
- Accuracy: 1 minus the mean average percent error
- Mean Average Deviation: the average absolute error
The code snippet for this section will appear after our multi-select widget but before our graph.
actual_volume = format(filtered_df['sales'].sum(), ",")
predicted_volume = format(int(filtered_df['sales_forecast'].sum()), ",")
filtered_df['abs_percent_error'] = abs(filtered_df['sales'] - filtered_df['sales_forecast'])/filtered_df['sales']
mape = filtered_df['abs_percent_error'].mean()
accuracy = str(round((1 - mape)*100.0)) + '%'
mad = str(round(filtered_df['abs_error'].mean(), 1))
The Streamlit method st.columns() allows you to add elements in side-by-side container objects. For four evenly spaced out columns, set four variables to st.columns(4). For two columns, the first taking three-quarters the width, set two variables to st.columns(3,1).
Just like how we used object notation to add markdowns and multi-select widget elements to the sidebar, we will use object notation to add markdowns to each column.
metric_col1, metric_col2, metric_col3, metric_col4 = st.columns(4)
metric_col1.markdown('##### Actual Volume:\n' + actual_volume)
metric_col2.markdown('##### Forecasted Volume:\n' + predicted_volume)
metric_col3.markdown('##### Accuracy:\n' + accuracy)
metric_col4.markdown('##### Mean Average Deviation:\n' + mad)
# line break to separate metrics from plot
st.markdown("""---""")
Nice! Our Streamlit ML Monitoring web app is coming together.
Step 7: Add a Dataframe for More Detail
For one final touch, we will display the filtered dataframe on our web app. For a less-cluttered look, use a Streamlit checkbox to toggle whether the dataframe is displayed or not.
# st.checkbox value defaults to False
show_df = st.checkbox('Show raw data')
# show dataframe is checkbox selected
if show_df:
st.dataframe(filtered_df, use_container_width=True)
Done!
With exactly 12 Streamlit calls, you can create an interactive web app to monitor the performance of your CPG forecasting models. Apps you make with Streamlit can be hosted internally within your organization or even on Streamlit cloud and shared with your business stakeholders.
In conclusion, after using Snowpark to forecast consumer packaged goods, we connected to Snowflake and queried the forecast data to create a Streamlit web app so that business stakeholders can easily view and monitor the performance of our models for each store and item.
Interested in expanding your machine learning workflow in Snowflake with Streamlit?