Optimizing Custom SQL for Tableau

Have you ever encountered a project that requires you to join and query several tables to feed into a dashboard, but due to various limitations (i.e., database permissions, ETL capability, processing, etc.), it has to be done using custom SQL in Tableau?

Hopefully, you don’t run into this scenario because joining and querying multiple tables in Tableau using custom SQL is not recommended due to its impact on performance. Still, if you ever do, I’ve compiled some tips and tricks to help you effectively build and implement your custom SQL in Tableau with minimal impact on performance and processing.

In this blog, we will cover effective strategies for optimizing Custom SQL queries in Tableau to enhance your performance and streamline data visualization. These strategies will range from ways to reduce the size of your data, to your query structure. In each section, you will find examples that demonstrate how to implement each strategy and drive the desired outcome. Hopefully these tips and examples will give you a kick start in your SQL optimization journey.

Tip 1: Only Bring in the Data you Need

An excellent first step to optimizing your query is removing the data you don’t need.

Refine what you’re bringing in, starting with the FROM clause; the FROM clause is the first part of the executed query, as per the standard SQL execution order. This means that if JOINS are present in your query, the FROM clause will be the first part read.

So let’s use an example: say your goal is to join the tables order and detail from a database called db, and you’re using the field sku to join the two tables. While the detail table below shows 9 columns, this table has 100 columns total, with the visual-only showing the first 9 columns.

Here are some example tables:

order

order_number	sku	date
3287493	1234	1/13/2023
2846287	4444	3/10/2023
1047263	5454	4/18/2023

detail (+91 columns not shown):

sku	class	sub_class	vendor	state	region	type	model_year	rate
1234	Power tool	pt1	Dotten	Maine	Northern	power	2023	$544
4444	Hand tool	ht2	Stanley	Utah	Western	tool	2022	$130
5454	Power tool	pt2	Press	Florida	Southern	power	2022	$470
6711	Ice maker	ap1	Scottie	Nevada	Western	appliance	2023	$255

The first query below will bring in both full tables and join them on the sku field. This means all 100 columns from the detail table will be included in this join, making it a heavy lift. The second query will bring in only the fields specified after the SELECT clause, thus significantly reducing the number of columns brought in by the JOIN.

Optimizing your query by only bringing in the necessary columns, mainly when your subquery allows you to considerably reduce the number of columns, will result in significant performance improvement.

Query 1

				
					SELECT *
FROM db.order
JOIN db.detail 
    ON order.sku = detail.sku

Query 2

				
					SELECT *
FROM db.order
JOIN (
SELECT 
    sku, 
    class,
    vendor, 
    rate 
FROM db.detail
) 
    ON order.sku = detail.sku

Tip 2: Apply Filters or Joins to Your Tables

Adding simple filters to your query is an easy way to reduce table size. Depending on the desired output, many filters can do the trick. You may only want your table to include data within a specific date range or associated with a particular category; using filters allows you to render only the data you need.

Where applicable, you can also write the filters in your query to be dynamic. This gives your SQL query more flexibility and general application. For example, if you want to isolate all of the rows from the order table where the date was in the last week, you could use the filter below:

				
					SELECT * 
FROM db.order 
WHERE date >= DATEADD(GETDATE(), -7, date)

Tip 3: Use Key Values

An effective way to organize your data is by using primary keys to build a foundational table to which you can join values. To do this, you first need to determine the level that you want your table.

Once you’ve identified your key column(s), you can create a table of unique rows based on the values of one or more primary columns; these keys can be configured as a single column of unique values or as unique value groupings of multiple columns. Creating keys allows for faster query execution and prevents duplication.

The example below creates a keys index using the unique sku and vendor combinations from the tables detail and vendor. The keys are compiled using the SELECT DISTINCT and UNION clause, and the additional fields are joined back to the keys through the JOIN clauses at the bottom of the query. See example:

vendor

sku	vendor	channel	code
1234	Dotten	store	14
4444	Stanley	online	5
5454	Press	online	14
6711	Scottie	store	18

detail (+91 columns not shown):

sku	class	sub_class	vendor	state	region	type	model_year	rate
1234	Power tool	pt1	Dotten	Maine	Northern	power	2023	$544
4444	Hand tool	ht2	Stanley	Utah	Western	tool	2022	$130
5454	Power tool	pt2	Press	Florida	Southern	power	2022	$470
6711	Ice maker	ap1	Scottie	Nevada	Western	appliance	2023	$255

Query 3

				
					SELECT 
    keys.sku,
    keys.vendor,
    detail.class,
    detail.rate, 
    vendor.channel,
    vendor.code
FROM (
    SELECT DISTINCT 
        sku, 
        vendor 
    FROM db.detail
    UNION
    SELECT DISTINCT 
        sku, 
        vendor 
    FROM db.vendor
) keys
LEFT JOIN db.detail 
    ON keys.sku = detail.sku AND keys.vendor = detail.vendor
LEFT JOIN db.vendor 
    ON keys.sku = vendor.sku AND keys.vendor = vendor.vendor;

This example created a foundational table of keys within the query using the sku and vendor columns. Then, the additional columns from the db.detail and db.vendor tables are joined against it to create this refined output table:

sku	vendor	class	rate	channel	code
1234	Dotten	Power tool	$544	store	14
4444	Stanley	Hand tool	$130	online	5
5454	Press	Power tool	$470	online	14
6711	Scottie	Ice maker	$255	store	18

Tip 4: Reduce Subqueries

Subqueries are simply queries that are nested in the main query. They are an effective way to refine or build on your data, but they can sometimes be challenging to understand and may not be the most efficient option.

CTEs are essentially subqueries defined at the front of the query by a WITH clause. CTEs are temporary result sets stored in memory and do not require disk access; compared to subqueries, they are cleaner, more easily retrievable, and can improve query performance.

You can work with CTEs like a table in your query: select from them, join them, etc. If your query is heavy on subqueries and you’re using the same data multiple times, consider using CTEs to clean your query and reduce workload.

Below is an example of Query 2 above, written to use a CTE:

Query 2 using a CTE

				
					WITH Sales_Detail AS (
SELECT
sku,
class,
vendor,
description,
rate
FROM db.detail
)

SELECT *
FROM db.order
JOIN Sales_Detail
ON order.sku = Sales_Detail.sku

Using CTEs in Tableau can be a little tricky, but here’s a guide that shows you how to use them.

Tip 5: Use an IDE for Build/Test

While you’re building your SQL query, you want to be able to quickly test and iterate to ensure you’re getting the desired results. Re-running your query repeatedly in Tableau can take a long time, especially if your query is doing a lot of work.

Download an IDE and connect to your database so you can build and test your query seamlessly and efficiently. This way, you can see the results and make adjustments much faster before transferring your query to Tableau. Two great IDE options are DataGrip and Visual Studio Code.

Conclusion

Optimizing your SQL query in Tableau is a great way to ensure that, despite limitations, performance remains high. Many of these tips are applicable outside of Tableau and are standard guidelines for SQL optimization. As you can see, small changes can significantly impact performance; the key is knowing the best strategy to use for each situation and what to avoid.

If you want more information on Optimizing Custom SQL for Tableau, contact our team of experts!

Optimizing Custom SQL for Tableau

Tip 1: Only Bring in the Data you Need

Tip 2: Apply Filters or Joins to Your Tables

Tip 3: Use Key Values

Tip 4: Reduce Subqueries

Tip 5: Use an IDE for Build/Test

Conclusion

More to explore

Top Use Cases for Generative AI in Finance

Alteryx Designer 2024.2: New Features and Tools

How to Monetize Healthcare Data in Snowflake

Join our team

Partners

Resources

Software

Accelerate and automate your data projects with the phData Toolkit

Industries

Solutions

Company

Technology Partners

Other Technology Partners

Check out our latest insights

Top Use Cases for Generative AI in Finance

Alteryx Designer 2024.2: New Features and Tools

Data Engineering

Consulting, Migrations, Data Pipelines, DataOps

Change Management, Enablement & Learning

COE, Coaching, PMO

Data Science and Machine Learning Services

MLOps Enablement, Prototyping, Model Development and Deployment

Strategy Services

Data, Analytics, and AI Strategy, Architecture and Assessments

Reporting, Analytics, and Visualization Services

Self-Service, Integrated Analytics, Dashboards, Automation

Elastic Operations

Data Platforms, Data Pipelines, and Machine Learning