Coalesce is quickly becoming the go-to ETL tool due to its unique code-first approach and low-code/no-code interface blend. This combination offers flexibility and user-friendliness, delivering speed, scalability, and robust governance. Its easy handling of complex data environments makes it an ideal choice for modern enterprises.
In this blog, we will go through the steps for Setting up GitHub within your Coalesce environment and mapping your GitHub repo to your Coalesce project. We’ll also explore git essentials like branching strategy within your Coalesce workspace, including creating branches, performing branch checkouts, committing code changes, and merging everything inside your Coalesce environment.
Why is Setting up Git on Your Coalesce Important?
Coalesce supports Git on the project level; every project must be mapped to a repository. If not, you cannot create workspaces or build pipelines. It is necessary to set up GitHub for Coalesce because it is a centralized platform for managing the project’s pipeline.Â
Here’s why it matters:
Streamlined Development.
Enhanced Collaboration.
Project Management.
Automation and Continuous Integration.
Supported Git Providers
Setting up GitHub on Coalesce Environment
Log in or sign up to your GitHub account. We will explore the following options and how they are mapped as part of Coalesce.
Creating a GitHub Repository
It’s essentially a folder or directory containing all the files and resources associated with a project, including code and configuration files. When you create a project in Coalesce, you’ll need to provide a URL to the remote repository so Coalesce can keep it updated with the changes you make.
If you are a new user of GitHub, you can use the Create Repository option to create a new repo.Â
If you are an existing GitHub user, you can see the Repositories tab and New icon, which will help you create a new repository.
We are creating a “Coalesce-Lab” repository, which will be mapped to my Coalesce project.
By default, new repositories are private, meaning only users you grant access to can view or contribute to the repository. If you need broader visibility, you can set the repository to public, allowing anyone to view it while still controlling who can contribute. Understanding these settings helps manage access and collaboration effectively.
Adding Git Account Details in Coalesce Environment
Now, we will see how you can map the Git repo with the Coalesce project, but before starting with mapping, we will first configure Git credentials in the Coalesce User Settings as below.
You will see a user icon at the top of your Coalesce. On the left pane, select User Settings and then Git Accounts. You can also see an Add icon to add your Git account details and credentials.
To set up GitHub on Coalesce, you need a GitHub – Fine-grained access token. You’ll need to select which repositories you want Coalesce to have Read and Write access to. Follow the instructions in the link to generate a token in GitHub. Provide all the GitHub credentials, such as nickname, username, token author name, and email address, and click on add.
Mapping Repository in Coalesce Environment
As shown below, every project must be mapped to a repository. If not, you cannot create workspaces or build pipelines. To configure a repository for an existing project, click Configure Git Account and provide repository details. If you create a new project, click the + sign to follow upcoming configurations.
In GitHub, under code and clone, you can see an option to copy the https URL, copy the URL, and provide it as part of the Git repository URL in your project creation step.
As part of the next step, select the Git Account (Created as part of adding Git account details in the user profile), then test the connection to ensure the connection to GitHub is happening. Once successful, click on Finish.
Now, you have successfully mapped your Coalesce project to a GitHub repository.
When configuring a Git account in Coalesce, you might encounter issues such as authentication errors, incorrect repository URLs, or permission problems. If you face authentication errors, double-check that your credentials and SSH keys are correctly entered and up-to-date. For issues with repository URLs, ensure they are accurate and properly formatted. If you encounter permission errors, verify that you have the necessary access rights to the repository.
Understanding Commit in Coalesce
Commit is a snapshot or a record of the changes made to the files in a Git repository at a specific time. When you commit changes in Git, you save the current state of your files along with a commit message.
To commit your changes in Coalesce, click the Git integration icon in the dashboard’s left bottom pane.
Once you click, the following window will pop up, and we will pop up below the window. Here is the overview.
Current Branch – This is the branch you have worked on in the development environment.
 Changes – These are the files that have changed. It will separate them into each Node and a data.yml file, which includes information such as Jobs and Macros.
Manage Changes:
Stage All – All files under changes are selected for the next commit.
Unstage All – No files are selected for the next commit. You’ll need to select the files you want to commit. At least one file needs to be selected to make a commit.
Discard All – Discard all changes that were made to the Workspace.
File Differences – Any changes to a Node or other metadata will be here. Green means additions, and red means deletions. See Viewing File Differences for an example.
Commit Message – A commit message describes the changes that were made.
Fetch – Fetch gets all changes from the repository. Including any changes made to the branch you are working on.
Commit & Push – Commit and push your changes to the repo with a commit message.
Understanding Branch and Branch Actions in Coalesce
A branch is a separate development line within a Git repository that allows you to work on new features, bug fixes, or experiments without affecting the main codebase. It’s like a parallel version of your project’s code, where you can make changes independently from the main branch.Â
For best practices, use descriptive and consistent naming conventions (e.g., feature/feature name
or bugfix/issue description
) to clearly indicate the branch’s purpose. Also, regularly update branches with the latest changes from the main branch to avoid conflicts and ensure smooth integration.
Whenever you create a new workspace in a Coalesce project, you must specify the main branch and the new feature/working branch where you will do all your development, then merge to the main branch at the end.
Creating a New Branch and Check-Out
Creating a new branch in Coalesce is a few clicks away. In the Git integration panel, we saw about the commit. You have another option called branch. When you click on the branch, you can see options for creating a new branch from previous commits, highlighted below.
Once you click on a new branch, the following will appear: It will show from which commit you are creating your branch. Once you click on Branch and Checkout, a new branch will be created and checked out.
You can also check out the latest branch by selecting the branch from the Selected Branch dropdown and clicking on Check Out Latest, which will check out the latest commit.
Merging Branch
After making changes and committing to a branch, you can merge the changes back to another branch.
Check out your destination branch to make it your Current Branch
Select the source branch with the incoming changes from the Selected Branch dropdown.
Click on Merge Latest(1) or Merge(2) next to the desired commit you want to merge into the current branch.
Merge Latest: This action will integrate the most recent commit with the Dev_Practice
branch.
Merge: This action allows you to select a specific commit to be merged into the Dev_Practice
branch.
All the changes from the Selected Branch will be merged into your Current Branch.
Handling Merge Conflicts
Merge conflict occurs when two or more branches make conflicting changes to the same part of a file, and Git cannot reconcile these differences automatically. This typically happens during a merge operation when combining branches.
In the example below, we have a branch dev_practice
and created another branch staging_fixes
branch from dev_practice
.Â
Some changes were made in the FACT_CUSTOMER_ORDERS
node, which was committed to dev_practice
and merged into the main branch. Check the staging_fixes
branch, make changes to the same FACT_CUSTOMER_ORDERS
node, and commit to the branch. While doing the merge, merge conflicts popped up.Â
To resolve conflicts, you need to check on all conflict markers ==== & <<<<, pick which part of the layers you want to keep and which part to remove, and ensure deleting all markers before doing the merge.
After resolving merge conflicts, you can verify that everything was handled correctly by running automated tests and performing a thorough code review. Use Git git diff and git status commands to check for any remaining discrepancies or unresolved issues before finalizing the merge.
Best Practices
Committing code in small, easily understood, and tested batches decreases the likelihood of integration conflicts. Keeping feature branches separated from the main branch reduces the chance of conflicts with team members’ changes.
Working in small batches allows us to keep commits simple and divide our work into single units to solve specific tasks. This approach helps us complete tasks faster and revert changes easily without unintended side effects. Within Coalesce, any work done between commits is available in the Git modal. Changing multiple files is possible, so when making your next commit, remember that Coalesce will include all files by default unless manually deselected. This ensures that changes are organized and committed efficiently.
Utilize branches to develop your data pipelines effectively. You can organize development tasks by leveraging branches segregating work in progress from stable, tested code in the main branch. Branches facilitate the easy merging of changes back into the main branch. Within Coalesce, this approach streamlines development, ensuring that your data pipeline evolves smoothly and efficiently.
Utilizing a branching strategy in Coalesce presents two primary approaches:
Leverage Singular Workspace: All work is branched within a single workspace in this method. This means that at any given time, only one branch is being worked on within the workspace, even if multiple users collaborate. While this strategy is straightforward, it may lead to conflicts and confusion, especially when multiple developers work on different features or fixes.
Create Workspace for Each Unit of Work: Alternatively, developers can create a separate workspace for each unit of work they’re performing, attaching a branch to it that is separate from the main branch. With this approach, each workspace is dedicated to a specific task, eliminating the need to switch branches within the workspace. The main or master workspace is the central hub, with additional workspaces and branches easily created. Coalesce adopts this strategy for developing data pipelines, ensuring efficient workflow management.
Adopting the second strategy is a best practice because it provides a clear and organized method for managing work, with each workspace dedicated to a specific task. This eliminates the need to switch branches within the workspace, streamlining the development process and aligning with the varied needs of different organizations.
Conclusion
I hope this blog has helped guide you through setting up and managing GitHub integration within the Coalesce environment. Here’s a quick recap of what we covered:
Creating a GitHub Repository: Setting up a new repo on GitHub.
Mapping Repository in Coalesce: Syncing your GitHub repo with Coalesce.
Understanding Commit in Coalesce: How to track and document code changes.
Understanding Branch and Branch Actions in Coalesce: Managing branches and related actions.
Merging Branches: Combining changes from different branches.
Handling Merge Conflicts: Resolving conflicts effectively.
Best Practices: Maintaining a clean commit history and effective code management.
For additional information, please check out our blog for guidance on how to build your next step with a Coalesce pipeline.
Need further assistance?