Welcome to the dbt Labs’ Coalesce 2024 recap! I’m Bruno from phData—you might have seen me on LinkedIn posting about dbt. Although I’ve been very engaged with dbt and the community for a while, this was my first time attending Coalesce in person, and I’m so excited to break down all the biggest announcements from the conference!
In this blog, I’ll share my experience attending, highlight some exciting awards, and unpack all the exciting updates from the event.
Overall Impression
It was my first time at Coalesce in person (and my first time in the US, by the way), and it was a totally different feeling than watching it online. The venue was great—a Vegas hotel with an entire floor dedicated exclusively to the conference.
There was a large room for vendor booths where you could find all kinds of swag—from coconut water to 3D-printed hero figures of yourself and even magic tricks. There were several rooms for speaking sessions covering customer stories, dbt for practitioners, dbt for enterprises, deeper dives into announcements, AI, and much more.
The only downside was that sometimes I wanted to watch more than one session simultaneously, but at least you can watch them on-demand. The sponsor parties were also excellent, and the Coalesce after-party exceeded my expectations!
But what I liked most was meeting in person a ton of extraordinary people I had previously only known online. I met with people from phData, dbt Labs, other companies, vendors, practitioners, and even friendly competitors from all around the world.
I love remote work and wouldn’t change it, but it’s a great feeling to connect with people in person; it’s a different kind of communication—even though I’m terrible at recognizing people from their profile pictures.
Awards
This was the perfect Coalesce for me to attend in person because, first of all, phData won the Partner of the Year Award Overall for the second year in a row!
Secondly, I was one of the individuals who won the dbt Community Award (also for the second time in a row)!
After the community awards announcements, I had the incredible opportunity to join a fireside chat about the community with two other great community award winners, where we discussed the dbt community and our personal journeys and experiences.
Dakota Kelley played a key role in helping us win the Partner of the Year Award and contributed to my own recognition with the community award once again. I also had the privilege of sharing the stage with Dakota during a speaking session titled “Advanced Pipelines in dbt Cloud,” which was a great opportunity to highlight their impact.
And last but certainly not least, what made all these awards even more special was that just days before Coalesce, I got engaged to my girlfriend, Danielly, and that’s a great award for me!
Biggest Announcements at dbt Labs’ Coalesce 2024
Thanks for reading up to here! This is the part you came here for—it’s time to talk about the dbt announcements.
One dbt
The main focus of the whole conference was, without a doubt, around the concept of One dbt.
According to the keynotes and presentations, dbt Labs is now moving towards a path where dbt is no longer just a transformation tool as it used to be. dbt is becoming one framework that helps everyone—from analysts to decision-makers, from companies of all sizes, whether it’s one person or thousands, and regardless of the platform they use. Whether it’s dbt Core or Cloud, it’s just one dbt.
To accomplish this, dbt has been integrating features that cover what they introduced as the data control plane—features like governance, orchestration, semantics, and catalog—in addition to the transformation part of the pipeline, of course.
Also, dbt is becoming more flexible and cross-platform, more collaborative, and more trustworthy than ever. Those were the words they used to announce the new features coming to dbt Cloud!
Flexible & Cross-Platform
For this category, the announcements are related to interoperability and adding support to more platforms and tools. This makes dbt more flexible, allowing you to work with the tool that fits you the best. For the announcements, we had two big ones that are closely related:
Iceberg Table Support: dbt Cloud now supports Apache Iceberg. Why is this huge? Iceberg is the open-source format for analytics tables, and the industry is adopting this standard. Customers want to be able to use the same format across platforms, and vendors are listening to it. And now you will be able to use Iceberg with dbt! You just need to add the table_format
configuration to your model. Additionally, you can add the external_volume
and base_location_subpath
configurations to specify where dbt will write the Iceberg table’s metadata and files. You can read more about the Iceberg configs here.
{{
config(
materialized = "table",
table_format="iceberg",
external_volume="s3_iceberg_snow",
)
}}
select * from {{ ref('raw_orders') }}
The Iceberg support makes it possible for the second announcement to come true:
Cross-Platform dbt Mesh: With dbt mesh, you can have different projects on the same platform. Now, you’ll be able to share models across different data platforms!
That’s the magic of using Iceberg: you can use the same table format on different platforms to reference models across platforms!
To be able to do it, you need to:
Integrate both platforms with the same Iceberg catalog.
Configure the upstream model to be public and to write to your Iceberg Catalog.
It will start with Snowflake, Databricks, Redshift, and Athena, but soon support for more platforms will be added.
And here are some other announcements for the flexibility and cross-platform category. dbt is adding support for more platforms and semantic layer connections:
New Integrations: dbt is welcoming AWS Athena (GA) and Teradata (Preview) to the family!
BI Tool Integration: A new dbt Semantic Layer connection to Power BI is coming soon!
Lastly, dbt Labs announced a cost optimization tool integrated into dbt Cloud (coming in 2025), where you will monitor your costs and get recommendations to reduce them.
Collaborative
The collaborative category of announcements aims to allow more people with different skills and backgrounds to work together in dbt.
Again, for the category, we have two major announcements:
Visual Low-Code Editor: An intuitive, visual drag-and-drop interface (currently in private beta) that allows you to create dbt models without writing SQL. Additionally, you can switch between code and visuals as you please.
For me, this is awesome for a few reasons:
People who don’t write SQL can write dbt models.
You can write your dbt model with SQL as you are used to, but you can explain your model visually for less technical folks and make it much more intuitive.
You can more easily check the output of some parts of your transformation, which is great for debugging and explaining the code to someone else.
You can find some problems in your code that you would not catch so easily looking at the SQL, like orphans CTEs. These CTEs, for example, that are not referenced in any part of the code, would appear as a single box with no connection in your visualization.
So, now people have three ways of creating dbt models and interacting with dbt Cloud, fitting from the more technical to less technical personas, dbt Cloud CLI, dbt Cloud IDE, and the upcoming Visual editor. Multiple ways of working in One dbt.
The other announcement for this category was not that new, but it came with some new additions.
dbt Copilot: Your AI assistant (currently in beta)! Copilot was already announced as an AI tool for generating tests, documentation, and semantic models. But at Coalesce 2024, some more features were added:
It can now answer data questions, so even those not used to SQL can get useful information from the data.
Plus, you can now bring your own OpenAI API key to Copilot!
In summary, these two announcements make it much easier for less technical users to work with dbt and make the work of technical people more accessible and faster.
Trustworthy
The last category of announcements is the ones that help you understand your data better and have a complete view of your project, increasing your trust in your data.
The major announcements for this category were
Advanced CI with Compare Changes: This feature gives you detailed insights into how your changes impact your data before deploying to production. You can see what will be added, removed, and changed from your tables so you don’t have any bad surprises in production. Also, the advanced CI makes it a lot easier for reviewers to review PR requests.
Auto-Exposures with Tableau: Automatically populate your dbt DAG with downstream exposures in Tableau (Power BI support coming soon).
This has already been announced, but some new features are coming. Soon, you can see more information in the exposure node, access the Tableau dashboard from the dbt Explorer, and embed Data Health Tiles (the next announcement).
Data Health Tiles: Embed health signals like data quality and freshness within any dashboard, giving stakeholders confidence in the data. For example, you can add these dbt Health Tiles to your Tableau Dashboard.
A lot of exciting new stuff is coming to dbt Cloud, and in case you have already forgotten all the announcements, check out this image that summarizes everything.
dbt Core v1.9
Besides the One dbt announcement, some very interesting new features are coming to dbt Core v1.9, which is already in beta at the moment of this blog. Feel free to test it out and give some feedback to dbt Labs!
New Snapshots Configuration
Snapshots in dbt are going through some changes. They started in dbt as resources defined in a YML file, then in SQL files inside a Jinja snapshot block, and now they’re returning to their YML origins.
These changes are not coming out of nowhere. There’s been a lot of discussion about Snapshots for the last few years, and dbt Labs brought the community heavily to this discussion, which I greatly appreciate. They admit that sometimes they can fail and will do their best to fix these problems with the community’s feedback.
So, here’s what’s new for Snapshots:
YML instead of SQL: Snapshots can now be configured in YML like sources are. So, for example, instead of creating an orders_snapshot.sql
file to snapshot your source table for orders, you would create a YML file like this:
snapshots:
- name: orders_snapshot
relation: source('jaffle_shop', 'orders')
config:
schema: snapshots
database: analytics
unique_key: id
strategy: timestamp
updated_at: updated_at
Here, you need to define the name, relation, and the same configurations as before. Then you might ask, “OK, but in the SQL file, I could write some custom transformation for the Snapshot, and now?” The answer is you still can!
By default, this Snapshot YML file will consider you want to select everything for the source, like a select * from source
. And that’s the desired behavior for most of the Snapshots. If you need to change it for any reason, you can define an ephemeral model and call it in the relation, like
{{ config(materialized='ephemeral') }}
select *
from {{ source('jaffle_shop', 'orders') }}
where some_condition
Refer to this model in the snapshot’s config:
snapshots:
- name: orders_snapshot
relation: ref('ephemeral_orders')
config:
schema: snapshots
database: analytics
unique_key: id
strategy: timestamp
updated_at: updated_at
target_schema
is now optional: This was an old complaint from the community, and dbt Labs fixed it. Before 1.9, all Snapshots were written into the same schema, no matter the environment, making the development harder. But no more than just one Snapshot for both prod and dev; now you can keep separate snapshots for each environment or keep one for all environments—it’s your choice!
When target_schema
is omitted, dbt will follow the rules defined by generate_schema_name
or generate_database_name
macros.
Note in the example file that you can set custom schemas and databases as in any other resource.
Meta column names are customizable: Another old complaint from the community. When your snapshot is materialized, it creates metadata columns such as dbt_valid_from
and dbt_valid_to
, and we couldn’t change their names. What people usually did was create a view on top of the Snapshot and rename the columns.
Now, you can rename them in the Snapshot YML file using the snapshot_meta_column_names
configuration.
snapshots:
- name: orders_snapshot
relation: ref('ephemeral_orders')
config:
unique_key: id
strategy: timestamp
updated_at: updated_at
snapshot_meta_column_names:
dbt_valid_from: start_date
dbt_valid_to: end_date
New Incremental Strategy: Micro-Batching
As for the second major dbt Core v1.9 announcement, dbt Labs launched a new incremental strategy, called micro-batching, based on the experimental strategy insert_by_period
.
Before v1.9, if you wanted to run a dbt incremental model, you were only able to run the whole incremental period in one single query. For example, if your incremental model processes one week of data, this whole week would be run in one query.
With micro-batch, you can break this query into micro-batch queries. So you could run your week into multiple days with one single model by just setting the configs. Let me show you what it would look like:
{{
config(
materialized='incremental',
incremental_strategy='microbatch'
event_time='_loaded_at',
batch_size='day',
lookback=7,
begin='2024-01-01',
full_refresh=false
)
}}
...
You need to:
Set your
incremental_strategy
asmicrobatch
Define an
event_time
. According to the docs,event_time
is the column indicating “at what time did the row occur.” And it is required for your microbatch model and any direct parents that should be filtered.Define the
batch_size
. Thebatch_size
is the granularity of your batches For our example, we want to break one week into 7 days, so we chooseday
. It can behour
,day
,month
oryear
.Define
begin
.begin
is the start point of the incremental model for initial and full-refresh runs.Define
lookback
(optional):lookback
is the number of batches, before the current timestamp, dbt will load.
Also, micro-batches make it easy for you to run backfills. You just need to pass via command line the interval you want to backfill. For example
dbt run --event-time-start "2024-09-01" --event-time-end "2024-09-04"
Another cool thing is that if you have some failed batches, you can use dbt retry
to rerun only those and not all the data.
--sample (coming up)
Lastly, I just wanted to comment on a feature I’m very excited about. It is not available yet, but it will be in the future.
This is the sample
flag. A lot of developers I know try to avoid high costs in development by running only samples of their tables. And they do it in different ways. They might add the sample to the SQL code with some if/else blocks; they can override built-in macros or do something else.
Fortunately, dbt Labs is working on integrating this sampling capability into dbt Core natively, and I can’t wait to use it!
If you want to see everything new in dbt Core v1.9, check out this page.
Closing Thoughts
Coalesce 2024 was an incredible event, filled with major awards, exciting announcements, and an amazing network of people. It’s hard to pick a favorite moment! I’m eager to dive into all the new features and excited to see where dbt Labs is heading in the future. I hope they continue to strengthen the community and give us all a strong voice moving forward.
And if your organization is looking to make the most of dbt, phData is ready to assist. As dbt’s Partner of the Year, we have the expertise to ensure your dbt setup is optimized and powerful, driving your organization forward. If you are still in doubt, check out our whitepaper on accelerating and scaling dbt for enterprise.