August 11, 2021

How to Setup Multi-Environment CI/CD Pipelines on AWS

By Satish Sanjeev Addagarla

Many of our customers have a common challenge: the lack of a process for using Continuous Integration and Continuous Delivery (CI/CD) to create and maintain AWS infrastructure environments.

The challenge arises from a few key areas. First, infrastructure as code (IaC) is a relatively new idea and doesn’t have the tool or process maturity that other aspects of software development have. That means you see a variety of approaches and techniques that aren’t ubiquitous.

Second, the desire to treat infrastructure and application code exactly the same from a CI/CD perspective leads to some heartburn—there are differences in the process and approach that need to be accounted for.

Instead, we need a reliable and easy-to-manage deployment pattern driven by infrastructure as code and an automated workflow to handle infrastructure deployments in multiple environments.

In this post, we provide a prescriptive approach to configuring, deploying, and managing multiple AWS infrastructure environments using IaC and the Cloud Foundation tooling phData has developed.

Disclaimer: If you are already familiar with the key concepts of IaC, AWS CloudFormation, and phData Cloud Foundation, feel free to jump to the multi-environment CI/CD section. Otherwise, we recommend you check out the two sections below to get acquainted.

What are CloudFormation and Sceptre?

CloudFormation is an AWS service that allows you to deploy and manage infrastructure using IaC described in JSON or YAML format called templates. These templates are deployed and managed as stacks on AWS. A stack is a collection of resources that can be created, updated, or deleted as a single unit. 

Sceptre is a tool to drive CloudFormation deployments, extending the functionality to orchestrate, replicate, and manage the CloudFormation stacks on AWS. It allows you to chain the outputs of one AWS stack as an input for another AWS stack. The tool uses a directory structure to separate the configuration files (parameters) for multiple environments while reusing the same templates for all environments. It supports the use of CloudFormation templates developed using JSON, YAML, Jinja2 or Python.

What is phData Cloud Foundation?

Even though CloudFormation and Sceptre provide strong capabilities to manage cloud infrastructure, one could not achieve a pattern that addresses all the challenges. That’s why we built phData Cloud Foundation. 

Cloud Foundation is a production-ready AWS CloudFormation templates library with an additional layer of scripts (called Cloud Foundation scripts). Together, these provide another layer of orchestration that covers all aspects of a multi-environment infrastructure provisioning solution.

Cloud Foundation facilitates an automated, IaC approach for deploying and supporting data products. By automating infrastructure provisioning with the tool’s wide library of proven, production-ready AWS CloudFormation templates, you’ll deploy faster in a consistent and repeatable fashion.

The Challenges of Multi-Environment CI/CD Pipelines

First, we must look at the challenges people face when setting up multiple AWS Infrastructure environments and the expectations around the tooling to address the challenges.

ChallengeSolution by phData Cloud Foundation

Mapping environments to AWS accounts

The phData solution defines environment mapping in a YAML file

Lower environments are scaled smaller than production

The configuration or parameters are separated by environment allowing different configuration for lower environments

Lower environments may not need the same approvals as production

Ability to configure additional approval cycles for the production environment

Production deployments can only happen at specified deployment windows

Provision to control the production deployment time through an additional manual approval

Production deployments can be to multiple regions

Provision to replicate the production environment to multiple regions

Need to have visibility into exactly what is changing and how those changes are applied

All details related to the deployment execution plan and execution summary are recorded within the pull request as comments

Building a Multi-Environment CI/CD Pipeline

Solution Overview

The following figure provides an overview of the multi-environment provisioning solution we are deploying.

Definitions

Deploy Plan – Deployment execution plan that elaborates the changes to be deployed in a cloud environment.

Deploy Summary – Deployment execution summary that elaborates the changes deployed in a cloud environment.

Solution Workflow

This post explains the workflow using two environments: development and production.

Below is an example workflow for development and production environments that follows the pull request workflow as explained in the solution capabilities. 

Source Code Repository Structure

To use this deployment pattern, the source code repository structure follows the Sceptre project directory structure as documented on the Sceptre home page. 

Sceptre uses the configuration files under the config directory and the CloudFormation templates under the templates directory to launch CloudFormation stacks.

Config

The config directory contains stack configuration files with required parameters by the CloudFormation templates.

These configuration files are separated by respective environment directory, allowing you to define different configurations for each environment such as development, QA, and production. The template_path attribute in the config file provides the path to the CloudFormation template for building the respective stack.

A simple configuration file for building a network stack looks like this:

network_stack.yaml
				
					template_path: network.py
 
parameters:
 NetworkName: phdata-prod-network
 VPCCIDR: "10.0.0.0/20"
 DHCPOptionConditionParam: "false"
 PublicSubnet1CIDR: 10.0.0.0/22
 PublicSubnet2CIDR: 10.0.4.0/22
 PrivateSubnet1CIDR: 10.0.8.0/22
 PrivateSubnet2CIDR: 10.0.12.0/22
 
stack_tags:
 - "owner=phdata.io"
 - "env=prod"
				
			

The template_path attribute refers to network.py in the above example which is a CloudFormation template developed in Python. The parameters and tags are dynamically passed to the template while deploying the stack.

Templates

The templates directory contains CloudFormation templates developed in JSON, YAML, Python, Jinja2, or AWS CDK. The templates developed in JSON and YAML are considered as standard CloudFormation templates and the templates developed in Python, Jinja2 and AWS CDK are rendered as CloudFormation templates by Sceptre at runtime.

(Refer to Sceptre documentation to understand how to use Python and Jinja2 templates.)

The outline for the network_stack looks like this: 

network.py

Templates

The templates directory contains CloudFormation templates developed in JSON, YAML, Python, Jinja2, or AWS CDK. The templates developed in JSON and YAML are considered as standard CloudFormation templates and the templates developed in Python, Jinja2 and AWS CDK are rendered as CloudFormation templates by Sceptre at runtime.

(Refer to Sceptre documentation to understand how to use Python and Jinja2 templates.)

The outline for the network_stack looks like this: 

network.py
				
					
class Network():
   def __init__(self, sceptre_user_data):  #parameter processing
   def add_vpc(self, sceptre_user_data):
   def add_internet_gateway(self, sceptre_user_data):
   def add_public_route_table(self, sceptre_user_data):
   def add_private_subnet(self, sceptre_user_data, num):
   def add_public_subnet(self, sceptre_user_data, num):
   def add_nat_gateway(self, sceptre_user_data):
   def add_s3_endpoint(self, sceptre_user_data):
   def add_dhcp_option(self, sceptre_user_data):
				
			

In this template, multiple AWS network resources are created in different methods in the Python template.

Gold Templates

Using the deployment pattern explained in this post, you can also refer to prebuilt infrastructure patterns/templates published in a repository manager like artifactory.

At phData, this is how we securely expose our ready-to-go CloudFormation templates to our customers such as AWS EMR, Airflow, AWS Redshift, AWS DMS, Snowflake, Cloudera, Hadoop, AWS ElasticSearch, and much more. These ready-to-go templates are called Gold Templates which include best practices tailored to support data-focused workloads such as machine learning pipelines, cloud-native data warehouse, IoT, and data lakes.

In the above network stack example, both the config and template for the network stack is maintained under an IaC source code repository. phData Cloud Foundation customers maintain config files with required parameters, while the actual template is managed by phData, allowing the customers to upgrade to newer versions of the template or use their own templates within the source code repository.

Deployment descriptor

The deployment.yaml file in the repository is called a deployment descriptor. This is where the environments (dev, production, etc.) are described and stacks are explicitly defined under deploy, undeploy, and ignore sections of the file. This descriptor file also gives the flexibility and control to the engineers to deploy the dependent stacks in the right order.

Based on this file the Cloud Foundation scripts:

				
					# cloudfoundation deployment descriptor


# define all your environments below
# stacks are deployed into environments based on deploy-env variable specified under DEPLOY section
deploy_environments:
   - {name: dev, account_id: dev-aws-account-id }
   - {name: prod, account_id: prod-aws-account-id }
 
# stacks listed under DEPLOY will be deployed by phData Cloud Foundation tool
# list the stacks as a list
deploy:
   - { name: network.yaml, phdata_gold_template: true, deploy_env: [dev, prod] }
   - { name: airflow/airflow-infra.yaml, phdata_gold_template: true, version: 1.0.0, deploy_env: [dev, prod] }
   - { name: logging.yaml, phdata_gold_template: false, deploy_env: [dev, prod] }
 
# stacks listed under UNDEPLOY will be deleted by phData Cloud Foundation tool if the stack already exists.
# list the stacks as a list.
undeploy:
   - { name: s3-bucket, deploy_env: [prod] }
 
# stacks listed under IGNORE are deployed by some other means,
# these will be ignored by phData Cloud Foundation tool
# No action will be performed, list the stacks as a list
ignore:
   - { name: ad.yaml, deploy_env: [prod] }

				
			

Build Specification

The buildspec.yaml file is used by the AWS code pipeline to pull the Cloud Foundation scripts from GitHub and executes them in the required order.

Customization

There is no one-size-fits-all solution for infrastructure provisioning. The solution depends on customer environments and requirements driven by multiple factors.

While the solution explained in this post is used as is by most phData customers, there may be a need for additional customization depending on the requirements. For example, we have deployed comment-based infrastructure provisioning for a couple of our customers.

When an authorized approver comments cf deploy dev on the pull request, the infrastructure changes are deployed in the development environment. If the approver comments cf deploy prod, the infrastructure changes are deployed in the production environment.

Sample Sequence Diagram

In Closing

In this post, you learned how to set up multi-environment CI/CD Pipelines on AWS using phData Cloud Foundation. For step-by-step guidance, we’ve documented instructions to deploy this multiple environment infrastructure provisioning solution. The quick start guide is available in the Cloud Foundation documentation. You can start with a template project and deploy the pipeline through the instructions provided. These instructions will help you configure the source code repository and aws codepipeline to deploy a multi-environment CI/CD pipeline.

Still need help?

At phData, we live, breathe, and thrive on helping businesses solve complex problems. If you have any lingering questions or if you’re interested in exploring how we can help, don’t hesitate to reach out. We’re happy to answer any questions.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit