Anyone who has used Matillion probably recognizes its resourcefulness in simplifying and accelerating the ETL/ELT process. However, it is not out of the ordinary to run into situations where Matillion’s built-in tooling cannot accomplish all that is necessary on its own.Â
In this blog post, I will illustrate how both serverless functions and the Snowflake Data Cloud’s built-in features can bridge this gap, easily managing both structured and unstructured data.
The bulk of this guide will involve the AWS ecosystem, but I will include details on how to carry out these tasks in Azure and GCP as well.
Why Are Extra Tools Needed?
While Matillion provides a diverse array of features and components with which one can connect and then precipitate the ingestion of multisource data, some edge cases can inevitably cause problems. Examples of these edge cases include the following:
- API endpoints requiring unconventional OAuth configurationsÂ
- JSONs using inconsistent keys
- proprietary (non-standard) interfaces
All of these require a custom solution, and serverless functions serve as an excellent option.
To get started, let’s delve into a real-world example where extra tooling is needed.
Real-World Example
One finds that an API has information that needs to be ingested every hour. The API requires a FULL OAuth 1.0 header – signature, nonce, timestamp, etc. Because Matillion’s API tooling cannot accomplish this alone, cloud provider services will be used to fill in the gaps. The additional services which will be employed will be AWS SQS, SNS, S3, and Lambda. The steps for this solution are detailed below:
1. A scheduled Matillion job sends an SNS message to trigger the Lambda function.
2. The Lambda, utilizing the requests_oauthlib library to meet the API’s OAuth requirements, grabs the data from the API endpoint and saves it to an S3 bucket.
3. Once the Lambda is done loading the data to S3, it sends an SQS message to Matillion.
4. The SQS message triggers a Matillion orchestration job that then grabs the JSON from S3, loading it into a VARIANT column in a staging table in Snowflake.
5. Optional) Matillion loads the VARIANT data from the staging table, transforming it and loading it into fact/dimension tables.
While the previous example could be viewed as helpful enough on its own, the SNS and SQS services can also be used to send configuration and status information between Matillion and the serverless function.Â
For instance, the Lambda function can be coded to interface with multiple API endpoints, and the SNS message from Matillion (which triggers the Lambda function) can also send the config info to the function to specify which API to ingest and which parameters to use.Â
On the other hand, the Lambda function can send status information to Matillion using SQS to tell Matillion whether or not the data was successfully sent to S3. Using this information, Matillion can determine whether to load the S3 data into a Snowflake variant in a staging table–or whether errors should be entered into an error logging table.
Because the steps listed above are AWS service-focused, I am including some equivalent tools for the other two primary Cloud providers below:
AWS Lambda:
- Azure equivalent – Functions
- Google Cloud equivalent- Cloud Functions
AWS SQS and SNS:
- Azure equivalent – Storage Queues
- Google Cloud equivalent – Pub/Sub
AWS S3:
- Azure equivalent – Azure Blob Storage
- Google Cloud equivalent – Cloud Storage
Actions taken through other Cloud providers would be nearly indistinguishable from the actions in my example above–with the exception of Azure and GCP having singular services replacing both AWS SNS and AWS SQS.
Conclusion
Hopefully I have successfully demonstrated that serverless functions, used in conjunction with Snowflake’s data type features, can assist Matillion with bringing in any type of data from any source.
The versatility of the serverless functions allows for the ability to connect to any data source and apply any pre-load transformations, and the robustness of data types such as VARIANT, ARRAY, and OBJECT allows any type of data to be loaded into Snowflake. Combining these tools allows for endless possibilities with data extraction and transformation.Â
As a Gold partner for Matillion and an Elite Partner in Snowflake, phData specializes in helping growth-forward businesses get the most out of both platforms. If you’re looking for guidance, help, or answers, reach out to phData today!