Welcome to our AWS Redshift to the Snowflake Data Cloud migration blog!
In this blog, we’ll walk you through the process of migrating your data from AWS Redshift to the Snowflake Data Cloud.
We’ll cover the key considerations you should keep in mind when planning your migration, as well as the steps you’ll need to take to ensure a smooth transition. Whether you’re looking to take advantage of Snowflake’s lower cost, improved scalability, or enhanced performance, this guide will provide you with the information you need to get started.
Before we dive in, it’s important to note that there are multiple ways to migrate data from Redshift tables to Snowflake.
One popular route is leveraging third-party ETL tools like Fivetran to ensure a smooth and successful migration. However, if you want to avoid the costs of using any third-party ETL tools, the following approaches in this blog will do the trick.
For this blog, we’ll look at how to do this by using the Redshift unload command, Snowpipe, and Spark.
Here’s a quick outline of how we’ll migrate:
- Using Redshift unload command to S3, from S3 to Snowflake using Snowpipe
- Using Redshift and Snowflake JDBC connection with Spark application
Pre-requisites: Snowflake, Redshift, S3, Spark
Why Migrate From AWS Redshift to Snowflake?
There are several reasons why your organization might consider migrating your data from Amazon Web Services (AWS) Redshift to the Snowflake Data Cloud. As an experienced data engineering consulting company, phData has helped with numerous migrations to Snowflake. Here are a few of the most common reasons why our clients move to Snowflake:
- Fully Managed: Snowflake offers a fully managed, cloud-based data warehousing service that is easy to set up and scale. This means that you don’t have to worry about managing and maintaining your own hardware and infrastructure, which can be time-consuming and costly.
- Cost Savings: Snowflake often has a lower total cost of ownership (TCO) compared to Redshift, as it charges based on usage rather than requiring upfront commitments. Snowflake’s highly flexible pricing model allows you to only pay for the resources you actually use, rather than being locked into a fixed-cost model.
- Scalability: Snowflake is designed to handle very large volumes of data and can scale up or down as needed, making it a good option for businesses with fluctuating workloads.
- Performance: Snowflake uses a hybrid cloud architecture that separates compute and storage, allowing it to process queries faster than Redshift.
- Ease of Use: Snowflake has a simpler architecture than Redshift, which can make it easier to set up and manage.
- Support of Data Sources: Snowflake supports a wide range of data sources and formats, including structured and semi-structured data, and it allows you to query data using SQL and other popular programming languages.
- Data Integration: Snowflake supports a wide range of data sources and has built-in support for data integration and transformation, making it easier to load and prepare data for analysis.
Overall, migrating to Snowflake can help you save time and money, and enable you to more easily and effectively work with your data.
Preparing for the MigrationÂ
Before migrating your data from Redshift to Snowflake, it’s important to consider the following tasks that will help you ensure a smooth transition:
- Create a list of databases in Redshift that need to be migrated.
- Create a list of tables in Redshift that need to be migrated.
- Prepare and Create the DDL’s for Snowflake.
- Create a list of current security roles, users, and permissions.
- Create a list of Snowflake accounts that need to be created or already exist.
- Create a current architecture diagram.
- Create a list of current processes to migrate with no changes.
- Create a list of current processes to migrate that need reengineering.
- Create a list of current processes to migrate that need to be fixed.
- Create a planned architecture diagram.
- Create a planned information architecture.
- Determine migration strategy (Bulk transfer versus a staged migration)
- Create a list of new tools that will be introduced in the migration.
- Create a list of current tools that will no longer be used post-migration.
- Create a list of development environments needed for the migration.
- Create a list of automated processes used for migration.
How to Migrate from AWS Redshift to Snowflake Using Redshift Unload Command to S3, From S3 to Snowflake Using Snowpipe
In Redshift, we have an unload command to extract the data to the AWS S3 location. The steps below will help you pull data from Redshift to S3. Next, we’ll use Snowpipe to ingest the data into Snowflake tables.
Steps:
- Unload the data to S3Â
- Create the stage table
- Create the auto ingest Snowpipe
1. Unload the Data to S3
Run the below statement in the Redshift query editor
UNLOAD ('select-statement')
TO 's3://path/filename'
CREDENTIALS ‘aws_access_key_id=XXXXXXX;aws_secret_key=XXXXXXX’
DELIMITER ‘|’
HEADER
CSV;
Once the files are in the S3 location, we can use Snowpipe to pull the data from S3 and load it into Snowflake tables (see the example below).
2. Create the Stage Table
Next, run the below statement in the Snowflake query editor:
create or replace stage mystage
file_format = 'CSV'
storage_integration = PHDATA_S3_READ
url = 's3://phdata-snowflake-stage/data/user_pgardhas/test_files/';
3. Create a Pipe with Auto-Ingest Enabled
The last step is to run the below statement in the Snowflake query editor:
create pipe demo_db.public.mypipe auto_ingest=true as
copy into demo_db.public.mytable
from @demo_db.public.mystage
file_format = (type = 'CSV');
As the file receives in S3 location, Snowpipe will identify and load the data into the respective Snowflake table.
Using Redshift and Snowflake JDBC Connection
If you are more comfortable with coding and want to build the Spark application, there is another way to connect to Redshift and Snowflake tables using JDBC connection. Below is the Spark code snippet, which will show you how to connect to Redshift and load it into a Snowflake table.Â
First, create the Redshift connection and read the Redshift table to DataFrame. Next, create the Snowflake connection. Lastly, leverage the Spark DataFrame to write the API and then load the data into the Snowflake table.
testApplication.py
from pyspark.sql import SQLContext
from pyspark.sql import SparkSession
from pyspark import SparkConf, SparkContext
#initialize the spark session
spark = SparkSession.builder.master("yarn").appName("Redshift_to_Snowflake_Load").enableHiveSupport().getOrCreate()
sc = spark.sparkContext
sqlContext = HiveContext(sc)
sc._jsc.hadoopConfiguration().set("fs.s3.awsAccessKeyId", "")
sc._jsc.hadoopConfiguration().set("fs.s3.awsSecretAccessKey", "")
# Get the Redshift table data into dataframe
df = spark.read \
.format("com.databricks.spark.redshift") \
.option("url", "jdbc:redshift://url.XXXXXXX") \
.option("dbtable", "table_name") \
.option("tempdir", "s3://mybucket/") \
.option("aws_iam_role", "XXXXXXX")
.load()
# Snowflake connection parameters
sfOptions = {
"sfURL" : ".snowflakecomputing.com",
"sfUser" : "",
"sfPassword" : "",
"sfDatabase" : "",
"sfSchema" : "",
"sfWarehouse" : ""
}
df.write.format("net.snowflake.spark.snowflake").options(**sfOptions).option("dbtable", "").mode('append').options(header=True).save()
Run command:
spark-submit --master yarn --deploy-mode client --jars snowflake-jdbc-3.13.6.jar,spark_snowflake_2.11-2.9.1-spark_2.4.jar,RedshiftJDBC42-no-awssdk-1.2.34.1058.jar testApplication.py
Best Practices
Here are a few best practices to consider:
- Have a clear vision of what data needs to migrate into Snowflake.
- Outline and share a clear existing Redshift model and a new Snowflake model.
- Plan and manage costs effectively. The pay-as-you-go model is a major reason why companies deploy Snowflake as it greatly reduces infrastructure costs.
- Establish a migration time frame.
- Keep the application simple, validate and test the data after the migration to make sure that all data has been completely and accurately migrated.
- Create an automation process.
Conclusion
In conclusion, migrating from AWS Redshift to Snowflake can provide many benefits for organizations looking to more easily and effectively work with their data.Â
If you are considering migrating from AWS Redshift to Snowflake, it is important to carefully plan and execute the process to ensure a smooth transition. By following best practices and working with a trusted partner, you can successfully migrate your data and realize the full potential of Snowflake.
At phData, we thrive at helping clients succeed with Snowflake. If you’re looking for advice, help, or answers, phData is happy to assist.Â