Hello and welcome to our May update of the phData Toolkit! Our toolkit is built to expedite your data migration project and give you tools to build and enforce an information architecture and data governance practice.
This includes capabilities within the Snowflake Data Cloud such as:
- Track/audit changes to objects, roles, and individual permissions
- Visualize your role hierarchy and object relationships
- Integrating your ITSM tool with Snowflake for access requests
- Automating the provisioning of resources
- Decoupling engineering teams from servicing common business requests
For our May update, we’ve got some great updates to call out for the platform. So let’s get right into it.
SQL Translation Updates
One of the core tools within the phData Toolkit is the free-to-use SQL Translation tool that allows customers and our own engineers to programmatically translate SQL queries between different SQL dialects. This is incredibly useful for enterprises looking to migrate their data platform from say Hadoop to Snowflake.
It’s common for enterprises to have hundreds of thousands of lines of Data Markup Language (DML) and Data Definition Language (DDL).
We’ve been focusing heavily on our Netezza to Snowflake translation over the last month to enable on-going projects with customers. Let’s take a deep dive into the new functionality and fixes we’ve accomplished.
Netezza to Snowflake
We’ve added support for the following statements:
- Implemented the “/=/” operator
- Implemented the “Quadruple” indentifer (a.b.c.d)
- Implemented the Create Library statement
- Implemented the TO_HEX function
We’ve also fixed some parsing issues with the following:
- bitwise operators (>>, <<, &)
- Create History statement
- using “del” as an identifier
- Sub-query operator
Let’s take a look at some examples.
One of the more interesting operators within Netezza is the “/=/”. This operator is commonly used in join operators to allow matches on both equal values and where both columns are null.
Netezza:
select
*
from
t1
where
a /=/ 4;
select
*
from
t2
where
c1 /=/ b;
Snowflake:
SELECT
*
FROM
t1
WHERE
a = 4;
SELECT
*
FROM
t2
WHERE
(
c1 = b
OR (
c1 IS NULL
AND b IS NULL
)
);
One of the hardest parts when transitioning between data platforms is feature parity. It’s common that functionality exists in one system that doesn’t match up or exist in another. These systems often have different ways that they tackle similar problems.
For example, let’s look at the Netezza “Create History” translation.
Netezza:
CREATE HISTORY CONFIGURATION hist1 NPS LOCALHOST USER ' user1' PASSWORD 'password' LOADINTERVAL -1;
Snowflake:
-- ERROR: NetezzaStmtVisitor: line 1, character 0: Snowflake does not support `CREATE HISTORY CONFIGURATION` statement.
/* CREATE HISTORY CONFIGURATION hist1
NPS LOCALHOST
USER ' user1'
PASSWORD 'password'
LOADINTERVAL -1; */
In this case, Snowflake handles history differently. By nature of time travel and streams, engineers are able to create audit logs and view the history of the table.
Next, let’s look at the bitwise operators that are frequently used for bit-shifting.
Netezza:
select
SUM((("val1".USAGE & (1 < < 0)) > > 0)) AS col1
from
t1;
Snowflake:
SELECT
SUM(
(
BITSHIFTRIGHT((BITAND("val1".USAGE, (BITSHIFTLEFT(1, 0)))), 0)
)
) AS col1
FROM
t1;
Project Administration Updates
One of the other major parts of the toolkit is our Project Administration tool (formerly known as Tram). This tool allows you to integrate ITSM workflows (or git workflows) with user, permission, and object provisioning automation. Project Administration is a configuration based tool that is executed by a phData provided Java jar file that is executed either manually or within an automation tool.
Over the course of the last month, we’ve been focusing on platform level upgrades to the tool.
One of the quality of life upgrades that we’ve made is having Project Administration check for schema existence before trying to execute statements against that schema. When performing the initial configuration/setup of Project Administration, there are a few tables that it automatically creates to track executed statements and other metadata tables. Previously, these would cause Project Administration to error if the schema for those tables didn’t exist.
We also fixed a minor bug with the “MUST_RESET_PASSWORD” feature for user creation.
Pipeline Automation Updates
Pipeline Automation (formerly known as Streamliner)is our pipeline automation tool, which has received a few updates this month as well. This tool helps with creating the necessary objects required within Snowflake to quickly ingest data into the platform via Snowpipe.
Pipeline Automation has the ability to (for some systems) crawl your existing database schema and gather information about the tables that you’re planning to migrate to Snowflake. Previously, this schema crawling was only available for Oracle systems, however we’ve now added functionality to support Hive and Impala as well! We’ve also increased the performance of crawling these schemas.
As part of our on-going security scrutiny, we’ve also reduced the number of dependencies within Pipeline Automation to minimize our security vulnerability footprint. This included an unused dependency that has not resolved the latest Log4J vulnerability.
phData Toolkit
If you haven’t already explored the phData Toolkit, we highly recommend checking it out!
We encourage you to spend a few minutes browsing the apps and tools available in the phData Toolkit today to set yourself up for success in 2022.
Be sure to follow this series for more updates to the phData Toolkit tools and features.