We at phData have written extensively about creating centralized data products in Power BI. Power BI’s, and now Fabric’s ability to centralize dashboards and Semantic Models (formerly datasets) so that reporting is easily accessible and data can be shared without unnecessarily duplicating is second to none in the business intelligence product realm. However, we have not yet touched on dataflows and the importance of this unique type of data product.
In this blog, we will explain dataflows and their use cases and show an example of how to bring data from Snowflake AI Data Cloud into a dataflow.
What are Dataflows, and Why are They So Great?
Most Power BI developers are familiar with Power Query, Which is the data transformation layer of Power BI. It allows a developer to perform routine ETL tasks in a low-code/no-code environment using drag-and-drop tools or custom M-Code.Â
The typical flow of data from the data source to the Power BI dashboard is the following:
Tables with data that are queried from the source
The data is transformed in Power Query
Data is then loaded into Power BI
The data model is created with all of the necessary relationships between the tables.
Calculated columns and measures are created using DAX
The data visualizations are constructed on the canvas
Power Query is useful because it allows us to perform more advanced data cleansing and transformations than what can be done in DAX. It also pushes those transformations further upstream, positively impacting the dashboard performance because the transformations don’t have to be calculated upon opening the report or on the fly.
But what happens when we want to use the same source tables to create a different Power BI Semantic Model? In theory, you could copy all of the transformations in Power Query and replicate them in the new PBIX file, but that’s not a great method for several reasons. First, if any additional transformations or edits to existing ones need to be performed, then that must be replicated for each PBIX file. Second, it drives up consumption or compute resources in your database.Â
Enter Dataflows, which are essentially the cloud version of Power Query. Dataflows are one of the data products that can be created in Fabric’s Power BI workload. Dataflows take all of the functionality that Power Query offers and separate it from the PBIX file so that a developer can reuse the transformations performed in the Dataflow for multiple Power BI Semantic Models. With Dataflows, you can use the same tables with the same transformations performed across as many Semantic Models as you need or provide access to another developer so they can create their Semantic Models and dashboards.
If you’re using a Dataflow, the flow of data from the source to the Power BI dashboard is the following:
Tables are queries from the source directly into a Dataflow
The data is transformed in the Dataflow
Data is loaded from the Dataflow into Power BI, and the data model is created. This step can be performed as many times as needed across as many data models.
If further transformations are needed, those can be done in Power Query. This may need to happen if, for example, with one of your data models, you’re connecting to another table outside the dataflow and needs to be transformed.
Calculated columns and measures are created using DAX
The data visualizations are constructed on the canvas
How to Create a Dataflow With Snowflake Data
Follow the steps below to create a Dataflow using a Snowflake connection.
Note: Dataflows should be enabled; if you do not see the option to create one, contact your Power BI (Fabric) tenant admin.
Click on the + New Item button at the top of the workspace where you create your dataflow.
Select Dataflow Gen1 if you are not yet on Fabric and Dataflow Gen2 if Fabric is enabled.
A new UI will open that looks identical to Power Query (in fact, Power Query is labeled on the top left)
Click on the Get data drop-down
You will see the dozens of data sources available for you to connect to, including other Dataflows. Search for and select Snowflake.
A new window will open. Provide the Snowflake Server, Warehouse, and Role if required for your use case.
If a required Power BI Gateway is configured and access provided, it will auto-populate in the Connections section after you have provided the Server and Warehouse details (details from the last two steps are not shown in the screenshot for privacy reasons).
Using the Choose data navigation panel on the left of the new window, select the table(s) that need to be brought into your Dataflow and then click on the green Create button.
The table(s) will now be available in the familiar Power Query UI, where you can perform all of the transformations.Â
After you’ve created the necessary transformation steps, click on the green Publish button at the bottom right. This will publish the Dataflow to the workspace and automatically trigger a refresh.
Now that you’ve created the Dataflow, you can connect to it in Power BI to create your data model:
With your Power BI workbook open, select Get data and then Dataflows from the ribbon on the top
Navigate to the Dataflow by selecting the appropriate Workspace, Dataflow, and table(s) from the pop-up window.
Click Load if no other transformations need to be done or Transform data if you want to further transform the data in Power Query.
Tips and Reminders for Using Power BI Dataflows with Snowflake
Your Dataflow with Snowflake data is now ready to be consumed by yourself and anyone you plan to provide access to. One thing to remember is that, when writing this blog, Dataflows are not supported by Power BI Deployment Pipelines. Therefore, if you have several workspaces for different deployment stages, we recommend creating your Dataflow directly in the production workspace.
Closing
Microsoft, particularly Power BI, has made strides in recent years to enable organizations to build data products connected to Snowflake. While Dataflows is not a new tool, it is just as important when creating a holistic data strategy in Power BI (or Fabric) because it allows us to reduce duplicative data and costly data tasks within our organizations.
If you want to learn more about integrating Snowflake with Power BI or have questions about optimizing your data strategy, our experts are here to help. Our team can guide you through the process and provide tailored solutions to your organization’s unique needs.
FAQs
Is a Power BI Gateway required for Dataflows and Snowflake?
If your organization requires you to use Snowflake Private Link rather than Public, you will need a Gateway to connect your Dataflow to Snowflake.
If I need to migrate, can I import the Power Query steps from a PBIX file into Dataflows?
As of now, no. The easiest way to do this is to copy the entire M-Code out of the Advanced Editor window of Power Query for each query and paste it into the Advanced Editor of the Dataflow.