In this world of data-driven, have you ever wondered where this data is stored in Tableau? If you are still looking for an answer to this question, then you have come to the right place. Before understanding this data storage, let us know a bit about Tableau.
Tableau is one of the most popular data visualization and business intelligence tools that help people see and understand their data. It helps to create interactive graphs and charts in the form of a dashboard and worksheet to get meaningful insight from the data. It allows us to analyze trends visually and make quick decisions.
In this blog, we will deep dive into and understand more about data in Tableau. Specifically, we will learn more about what kind of data is stored in Tableau and where they are stored.
Tableau Architecture
Let’s understand a bit more about Tableau architecture which will help in better knowledge of where Tableau data is stored.
There are mainly 5 components in Tableau architecture.
1. Data Server: These are basically the databases, files, and data warehouses to which any dashboard connects for the rendering of visuals.
2. Data Connector: The Data Connectors provide an interface to connect external data sources with the Tableau Data Server.
3. Components of Tableau Server:
a. Application Server: The application server is used to provide authorizations and authentications.
b. VizQL Server: VizQL server is used to convert the queries from the data source into visualizations.
c. Data Server: Data server is used to store and manage the data from external data sources.
4. Gateway: This component directs the requests from users to the appropriate Tableau Server components as per the action requested.
5. Clients: The visualizations and dashboards in the Tableau server can be edited and viewed using different clients. Clients are web browsers, mobile applications, and Tableau Desktop.
Tableau Data Sources
A Tableau data source is the link between your source data and Tableau. It is essentially the sum of your data (either as a live connection or an extract), the connection information, the names of tables or sheets containing data, and the customizations that you make on top of data to work with it in Tableau.
A Tableau data source may contain multiple data connections to different databases or files. The connection information includes where the data is located, such as a file name and path or a network location, and details on how to connect to your data, such as database server name and server sign-in information.
All these data sources can be kept as either Live or Extract data. Let’s understand what these two terms mean exactly.
Live Connection: As the term says, all data corresponding to Live connection will be live/real-time data. What it means is the dashboard that connects to a Live connection will only render the current or latest data that it has in the database.
- Extract Connection: An extract connection, on the other hand, is like a snapshot of the data. When the connection is created or refreshed, Tableau checks for the latest data in the database at that point in time and pulls it together as a file system. The file gets created in your local machine/Tableau Server as HYPER (.hyper) or Tableau Data Extract (.tde) extensions. These files contain data that is used to render your Tableau dashboards. This data can be refreshed in a periodic fashion as per the schedule or need.
Tableau Data Engine
Hyper is Tableau’s in-memory Data Engine technology optimized for fast data ingestion and analytical query processing on large or complex data sets.
Hyperpowers the Data Engine in Tableau Server, Tableau Desktop, Tableau Cloud, and Tableau Public. The Data Engine is used when creating, refreshing, or querying extracts.
Here are some reasons why the Data Engine powered by Hyper performs better on larger or more complex extracts and is optimized for faster querying:
Hyper technology is designed to consume data faster.
Hyper technology is memory optimized.
Hyper technology is CPU optimized.
Hyper is a compiling query engine.
Hyper technology uses advanced query optimizations to make queries faster.
Views published to Tableau Server are interactive and sometimes have a live connection to a database. As users interact with the views in a web browser, the data that is queried gets stored in a cache. Subsequent visits will pull the data from this cache if it is available. This is termed Data Caching in Tableau.
Where does Tableau Data Engine store data?
As we understand, the Tableau Data Engine is mainly responsible for creating hyper extracts. The extracts are created as a file with the extension .hyper or .tde. The hyper files created are stored in the local machine, which gets accessed when any dashboard is opened.
Tableau Server and Tableau Cloud
Tableau Server and Tableau Cloud (Online) are complementary products that provide a way to publish, share, and distribute Tableau workbooks and data sources. These both perform in the same manner, with the difference being in how they are being hosted and managed.
The Tableau Server is hosted and maintained within your company’s firewall, which can be deployed on Cloud as well as On-Premises. While the Tableau Cloud is a SaaS version of the Tableau Server, which is completely managed by the Tableau team.
You can refer to our blog on “Tableau Server vs. Tableau Cloud (Online): What’s Best for Your Business?” to understand more about this.
When we talk about data storage in Tableau Server or Tableau Online, there are two kinds of data that are stored: a) Repository Data and b) Object Data. Let’s understand where both these data are stored in Tableau Server/Tableau Cloud.
Tableau Repository Data: Tableau Server Repository is a database that stores server data. This data includes information about Tableau Server users, groups and group assignments, permissions, projects, data sources, and workbooks, and extract metadata and refresh information. This data is stored in the PostgreSQL database, which comes with the Tableau Server package and is hosted in-house.
Tableau Object Data: When we talk about object data, it refers to the data available in Dashboards or user-specific data. All this data can be either Live or Extract, as we talked about earlier. The Live data is again stored in the databases and fetched directly when a dashboard is opened. The Extract data is available in the Tableau Server infrastructure nodes. This data is stored in the file format of .hyper or .tde. When a dashboard is opened, the data available in these files are loaded to render visuals. The Tableau Server also holds this data for a specific user as a cache in its memory. The cache is stored as a temporary file (.tmp) in servers for faster loading. You can configure the timeout of the cache data using the command below in your Tableau Server.
tsm data-access caching set -r <value>
where r is Refresh Frequency minutes.
Tableau Desktop
A Tableau Desktop is a tool/desktop client which helps you to create new dashboards in Tableau. To create any new dashboard, we need to connect to data that is going to be rendered as visuals. Data connected in Tableau Desktop are either stored in your local drive as a file or in any other database system.
This scenario mostly occurs when you are using a Live data connection from your Tableau Desktop. When you create an Extract connection, then you can store the data wherever you need as a file system with .hyper extension.
There can be another scenario where you have a packaged workbook being opened via Tableau Desktop. A Packaged Workbook comes with an extension of .twbx and contains a copy of any local file data sources and background images.
The workbook is no longer linked to the original data sources and images. If your Packaged Workbook contains Live connections, then the data displayed in the dashboard will be available as .tmp or .temp files under the Local AppData folder residing in your user path.
Tableau Public
Tableau Public is a free platform to explore, create, and publicly share data visualizations online. Tableau Public is limited to 15M rows of data per workbook. Tableau Public workbooks and data are not private. All workbooks and data are freely accessible to anyone.
The workbooks hosted in Tableau Public cannot be stored as Live connections to data. Rather, they are only placed as Extract connections. The data being used by workbooks in Tableau Public is stored as an extract in the cloud, where it is hosted as a hyper file.
Best Practices for Data Storage in Tableau
Tableau always recommends going with Extract data connections instead of Live connections as they yield better performance. When we talk about Extract connection, then it means the data is stored as a file system.
It’s not just about how data is stored as a file system. We would also need to consider the path/location of these file systems for storage to provide better results with respect to performance. There are a variety of storage locations to consider:
Desktops/laptops
Networked drives
External hard drives
Optical storage
Cloud storage
Flash drives (while a simple method, remember that they do degrade over time and are easily lost or broken)
A simple, commonly used storage system is the 3-2-1 methodology. This methodology suggests the following strategic recommendations: 3: Store three copies of your data, 2: Use two types of storage methods, and 1: with one of them stored offsite.
This method allows smart access and makes sure there is always a copy available in case one type or location is lost or destroyed, without being overly redundant or overly complicated.
Closing
Whether you are using a Tableau Desktop, Tableau Server, Tableau Cloud, or Tableau Public, the data gets stored in only two ways. It is either directly available in the databases and fetched as Live data or stored as .tde/.hyper files which are called Extract data. These extracted data can be located anywhere in your local drive, server hosts, cloud storage, network drives, etc.
If you have more questions about this and want to deep dive into where and how data is stored, then don’t hesitate to contact our Tableau experts.
Tableau Desktop is used for creating/editing any dashboards in Tableau. The data connected to it is stored as extracts. These extracts are snapshots of data that are compressed, stored in your local drive, and loaded as and when required.
By default, all Tableau files are stored in My Tableau Repository under the Documents folder unless the path is specified explicitly. These files can be twb, twbx, tds, tdsx, tde, or hyper.