dbt’s SQL-based approach democratizes data transformation. However, python and other programming languages edge out SQL with its metaprogramming capabilities. dbt’s Jinja integration bridges the gap between the expressiveness of Python and the familiarity of SQL.
Jinja’s usage will significantly empower you to build dynamic and reusable data pipelines, especially when dealing with conditional logic and templatization within dbt.
In this blog, we will extract the essence of Jinja in dbt.
What is Jinja?
Jinja is a powerful and versatile templating engine. Jinja acts as embedded mini-programs, boosting code dynamism and efficiency. Think of it as a flexible tool that separates the static layout of your code from the dynamic data that fills it. Instead of hardcoding, you can use Jinja’s expressions to integrate data seamlessly.
Pairing Jinja with dbt
This section will dive deep into the powerful synergy between Jinja and dbt.
We are going to explore the following topics in detail:
Parameterization
Decision-making constructs
Iteration
Dynamic code building blocks
Methods
Parameterization: Allowing Models to Adapt Based on Input Values
Jinja’s variables store frequently used values, environment-specific values, and table prefixes in your dbt_project.yml
file. The variable can be assigned as strings, lists, and dictionaries.
{% set schema = var("my_schema") %}
--var keyword will fetch varible from dbt_project.yml file
{% set states = ['Victoria', 'Queensland', 'Tasmania'] %}
{% set countries = {"state_countries": ['India','United States of America'] ,
"province_countries": ['China','Canada'] }
%}
Jinja provides default variables that will contain predefined values based on the profile.Â
A few commonly used default variables are:
Variable | Description | Variable types | Value |
---|---|---|---|
ref |
Reference one model with another | ref('model_name') |
latest model version |
ref('model_name', version=2) |
Custom model version | ||
target |
Contains connection information | target.profile_name |
active profile |
target.name |
active target | ||
target.schema |
dbt schema name | ||
target.type |
active adapter | ||
target.threads |
number of threads | ||
this |
Database representation of the current model | this |
fully qualified table name |
this.database |
database name | ||
this.schema |
schema name |
The variables can also be passed by using the --vars
command line option as a key-value pair. The variables defined with the --vars
command line argument have the highest order of precedence.
dbt build -m model_name --vars '{"schema": "prod"}'
Decision-making Constructs: The Ability to Make Logic-based Choices
Jinja’s if statements help the models to adapt to the circumstances. Tailor the data transformations depending on dynamic conditions or even build different pipelines for different environments.
{% if target.name == 'prod' %}
{% if country in state_countries %}
SELECT
state_name, district_name, zip_code
from {{ var("my_schema") }}.country
{% elif country in province_countries %}
SELECT
province_name, county_name, postal_code
from {{ var("my_schema") }}.country
{% endif %}
{% else %}
select * from {{ var("my_schema") }}.country
{% endif %}
Jinja is loaded with various operators to support the conditional statements.
Logic Operators | Comparison Operators |
---|---|
|
|
Â
Iteration: Repeated Execution of a set of Instructions
Jinja’s for loop provides the ability to automate repetitive tasks, eliminating boilerplate code and streamlining your data transformations. It lets you iterate over lists and dictionaries to dynamically construct SQL statements on the fly, for loops improve the code readability immensely. The snippet below shows the use of a for loop by iterating over a dictionary and list.
{% set countries = {"state_countries": ['India','United_States_of_America'] ,
"province_countries": ['China','Canada'] }
%}
{% for key, value in countries.items() %}
{% if key == 'state_countries' %}
{% for state in value %}
SELECT
state_name, county_name, age_group,
access_to_electricity, access_to_technology
FROM {{ var("my_schema") }}.{{ state }}_stats
{% if not loop.last %} union all {% endif %}
{% endfor %}
{% endif %}
{% endfor %}
Jinja provides multiple looping properties to unlock more use cases.
Property | Data Type | Description |
---|---|---|
loop.first | Boolean | Returns True during the first iteration |
loop.last | Boolean | Returns True in the previous iteration |
loop.index | INT | Returns the loop iteration number. Start with 1 . |
Dynamic Code Building Blocks: Ability to Create Dynamic Code
Jinja’s macros are predefined blocks of Jinja code that you can reuse throughout your dbt models.Â
Major benefits of macros:
Repetitive Code Reduction: Eliminates the need to copy and paste the same logic across different models.
Improve Code Maintainability: Updating common transformations in one place.
Increase Code Modularity: This breaks down complex logic into smaller reusable units, leading to more organized code.
-- Add the code to a .sql on the macros folder
{% macro audit_columns() %}
current_timestamp as load_timestamp,
current_date as batch_date
{% endmacro %}
The macro can be used in other models and macros.
select *
, {{ audit_columns() }}
FROM {{ var("my_schema") }}.country
Remember, Jinja compiles the code to plain SQL. Use the dbt compile
command to check the compiled code. Ensure that the syntax and logic are correct.
--compiled query
select *
, current_timestamp as load_timestamp,
current_date as batch_date
FROM continent.country
The macro can be executed independently as well with the dbt run-operation
command.
{% macro create_new_schema(schema) %}
create schema if not exists {{ schema} };
{% endmacro %}
dbt run-operation create_new_schema --args '{"schema": "new_schema"}'
command will execute the macro.
Methods: Transformation and Formatting Utilities
Jinja streamlines your data transformations by offering built-in methods for handling both text and numbers.
Method | Syntax | Output/ Description |
Split | "dbt transformation".split(' ') | ["dbt," "transformation"] |
Replace | "dbt" | replace("d", "t") | "dbt" |
Trim | "dbt  " | trim | "dbt" |
Length | "dbt" | length | 3 |
Title | "dbt transforms data" | capitalize | "Dbt transforms data" |
Capitalize | "dbt" | capitalize | "Dbt" |
Upper | "dbt" | upper | "DBT" |
Lower | "DBT" | lower | "dbt" |
print("dbt is an open source tool") | dbt is an open-source tool | |
String | 11 | string | "11" |
Int | "11" | int | 11 |
Float | 11 | float | 11.0 |
Round | 11.123 | round(1) | 11.1 |
Command block | {# comment line #} | Codes can be documented using the command line block |
Statement block | {% set dbt = "jinja" %} | Jinja functions will be defined inside a statement block |
Template block | {{ ref("model") }} | Methods and variables need to be passed inside a template block |
Whitespace removal | {%- set dbt = "jinja" -%} | - is used for removing white spaces. This will make the compiled code look cleaner. |
Conclusion
Jinja offers a dynamic toolkit that enhances your dbt models and elevates our data-wrangling skills. We can unlock valuable insights and drive data-driven decisions efficiently by wielding Jinja’s power.
If you have any further questions regarding how to ace dbt with Jinja, reach out to our team of dbt experts!