Have you ever encountered a project that requires you to join and query several tables to feed into a dashboard, but due to various limitations (i.e., database permissions, ETL capability, processing, etc.), it has to be done using custom SQL in Tableau?Â
Hopefully, you don’t run into this scenario because joining and querying multiple tables in Tableau using custom SQL is not recommended due to its impact on performance. Still, if you ever do, I’ve compiled some tips and tricks to help you effectively build and implement your custom SQL in Tableau with minimal impact on performance and processing.
In this blog, we will cover effective strategies for optimizing Custom SQL queries in Tableau to enhance your performance and streamline data visualization. These strategies will range from ways to reduce the size of your data, to your query structure. In each section, you will find examples that demonstrate how to implement each strategy and drive the desired outcome. Hopefully these tips and examples will give you a kick start in your SQL optimization journey.
Tip 1: Only Bring in the Data you Need
An excellent first step to optimizing your query is removing the data you don’t need.Â
Refine what you’re bringing in, starting with the FROM
clause; the FROM
clause is the first part of the executed query, as per the standard SQL execution order. This means that if JOINS
are present in your query, the FROM
clause will be the first part read.
So let’s use an example: say your goal is to join the tables order and detail from a database called db, and you’re using the field sku to join the two tables. While the detail table below shows 9 columns, this table has 100 columns total, with the visual-only showing the first 9 columns.Â
Here are some example tables:
order_number | sku | date |
---|---|---|
3287493 | 1234 | 1/13/2023 |
2846287 | 4444 | 3/10/2023 |
1047263 | 5454 | 4/18/2023 |
sku | class | sub_class | vendor | state | region | type | model_year | rate |
---|---|---|---|---|---|---|---|---|
1234 | Power tool | pt1 | Dotten | Maine | Northern | power | 2023 | $544 |
4444 | Hand tool | ht2 | Stanley | Utah | Western | tool | 2022 | $130 |
5454 | Power tool | pt2 | Press | Florida | Southern | power | 2022 | $470 |
6711 | Ice maker | ap1 | Scottie | Nevada | Western | appliance | 2023 | $255 |
The first query below will bring in both full tables and join them on the sku field. This means all 100 columns from the detail table will be included in this join, making it a heavy lift. The second query will bring in only the fields specified after the SELECT
clause, thus significantly reducing the number of columns brought in by the JOIN
.Â
Optimizing your query by only bringing in the necessary columns, mainly when your subquery allows you to considerably reduce the number of columns, will result in significant performance improvement.
SELECT *
FROM db.order
JOIN db.detail
ON order.sku = detail.sku
SELECT *
FROM db.order
JOIN (
SELECT
sku,
class,
vendor,
rate
FROM db.detail
)
ON order.sku = detail.sku
Tip 2: Apply Filters or Joins to Your Tables
Adding simple filters to your query is an easy way to reduce table size. Depending on the desired output, many filters can do the trick. You may only want your table to include data within a specific date range or associated with a particular category; using filters allows you to render only the data you need.
Where applicable, you can also write the filters in your query to be dynamic. This gives your SQL query more flexibility and general application. For example, if you want to isolate all of the rows from the order table where the date was in the last week, you could use the filter below:
SELECT *
FROM db.order
WHERE date >= DATEADD(GETDATE(), -7, date)
Tip 3: Use Key Values
An effective way to organize your data is by using primary keys to build a foundational table to which you can join values. To do this, you first need to determine the level that you want your table.Â
Once you’ve identified your key column(s), you can create a table of unique rows based on the values of one or more primary columns; these keys can be configured as a single column of unique values or as unique value groupings of multiple columns. Creating keys allows for faster query execution and prevents duplication.
The example below creates a keys index using the unique sku and vendor combinations from the tables detail and vendor. The keys are compiled using the SELECT DISTINCT
and UNION
clause, and the additional fields are joined back to the keys through the JOIN
clauses at the bottom of the query. See example:
sku | vendor | channel | code |
---|---|---|---|
1234 | Dotten | store | 14 |
4444 | Stanley | online | 5 |
5454 | Press | online | 14 |
6711 | Scottie | store | 18 |
sku | class | sub_class | vendor | state | region | type | model_year | rate |
---|---|---|---|---|---|---|---|---|
1234 | Power tool | pt1 | Dotten | Maine | Northern | power | 2023 | $544 |
4444 | Hand tool | ht2 | Stanley | Utah | Western | tool | 2022 | $130 |
5454 | Power tool | pt2 | Press | Florida | Southern | power | 2022 | $470 |
6711 | Ice maker | ap1 | Scottie | Nevada | Western | appliance | 2023 | $255 |
SELECT
keys.sku,
keys.vendor,
detail.class,
detail.rate,
vendor.channel,
vendor.code
FROM (
SELECT DISTINCT
sku,
vendor
FROM db.detail
UNION
SELECT DISTINCT
sku,
vendor
FROM db.vendor
) keys
LEFT JOIN db.detail
ON keys.sku = detail.sku AND keys.vendor = detail.vendor
LEFT JOIN db.vendor
ON keys.sku = vendor.sku AND keys.vendor = vendor.vendor;
This example created a foundational table of keys within the query using the sku and vendor columns. Then, the additional columns from the db.detail and db.vendor tables are joined against it to create this refined output table:
sku | vendor | class | rate | channel | code |
---|---|---|---|---|---|
1234 | Dotten | Power tool | $544 | store | 14 |
4444 | Stanley | Hand tool | $130 | online | 5 |
5454 | Press | Power tool | $470 | online | 14 |
6711 | Scottie | Ice maker | $255 | store | 18 |
Tip 4: Reduce Subqueries
Subqueries are simply queries that are nested in the main query. They are an effective way to refine or build on your data, but they can sometimes be challenging to understand and may not be the most efficient option.Â
CTEs are essentially subqueries defined at the front of the query by a WITH
clause. CTEs are temporary result sets stored in memory and do not require disk access; compared to subqueries, they are cleaner, more easily retrievable, and can improve query performance.Â
You can work with CTEs like a table in your query: select from them, join them, etc. If your query is heavy on subqueries and you’re using the same data multiple times, consider using CTEs to clean your query and reduce workload.
Below is an example of Query 2 above, written to use a CTE:
WITH Sales_Detail AS (
SELECT
sku,
class,
vendor,
description,
rate
FROM db.detail
)
SELECT *
FROM db.order
JOIN Sales_Detail
ON order.sku = Sales_Detail.sku
Using CTEs in Tableau can be a little tricky, but here’s a guide that shows you how to use them.
Tip 5: Use an IDE for Build/Test
While you’re building your SQL query, you want to be able to quickly test and iterate to ensure you’re getting the desired results. Re-running your query repeatedly in Tableau can take a long time, especially if your query is doing a lot of work.Â
Download an IDE and connect to your database so you can build and test your query seamlessly and efficiently. This way, you can see the results and make adjustments much faster before transferring your query to Tableau. Two great IDE options are DataGrip and Visual Studio Code.
Conclusion
Optimizing your SQL query in Tableau is a great way to ensure that, despite limitations, performance remains high. Many of these tips are applicable outside of Tableau and are standard guidelines for SQL optimization. As you can see, small changes can significantly impact performance; the key is knowing the best strategy to use for each situation and what to avoid.
If you want more information on Optimizing Custom SQL for Tableau, contact our team of experts!