Case Study

US Telecom Giant Sharpens ML Capabilities Using Feature Store + Snowflake

Snowflake

Customer's Challenge

A prominent leader in the US telecom industry sought to centralize, streamline, and enhance its users’ access to and handling of business data in hopes that it would positively impact the development and deployment of their machine learning models.

phData's Solution

phData proposed implementing a Feature Store, based on the Feast Framework, that integrated with Snowflake. This solution would allow final data consumers and ML engineers access to real-time inference data in a standardized form that adheres to all pertinent business rules and cleansing procedures.

Results

The eight-week project yielded significant improvements in data standards, data segmentation, and processes for the client. The telecom’s data scientists and BI users now have access to consolidated datasets devoid of sensitive information — an integral stride forward in data governance practices. 

This project introduced additional layers that empower ML engineers to collaborate more seamlessly in model development. The collaborative framework facilitates team-based efforts in model development, result validation, troubleshooting, and other essential aspects, contributing to a more agile and efficient workflow for developing and deploying machine learning models.

The Full Story

Before phData’s engagement in the initiative, the customer had already established multiple ad hoc processes for ML Data Capturing to train models and support real-time inferences based on newly streamed datasets. However, these existing disparate processes presented notable challenges, including code duplication and susceptibility to human errors during model updates, troubleshooting routines, or the creation of new models. 

The diverse range of processes led to inefficiencies and resulted in a gap in maintaining consistent data manipulation standards. Recognizing the need for a more streamlined and standardized approach, phData intervened with a comprehensive data initiative. 

phData Blue Shield
phData Blue Shield

Why phData?

The customer was drawn by phData’s extensive experience with Snowflake and their distinctive approach, characterized by a mindset focused on tailored yet efficient strategies that deliver value and foster enduring partnerships.

Implementing a Feature Store Integrated with Snowflake

This initiative involved the implementation of a Feature Store based on the Feast Framework, with a key foundation being the integration of the Snowflake Solution. 

Snowflake was utilized to ingest, transform, and store datasets, which would then be mapped over the Feature Store. Additionally, Snowflake served as a metadata storage for the Feature Store, accommodating the necessary metadata related to the datasets mapped, keys, references, and other essential information required by the Feast Framework. 

This diagram provides a high-level overview of utilizing a UML Use Case Format to elucidate the Feast usage on various data use cases.

A Feature Store, built upon the Feast framework, serves as a central repository for organizing and managing machine learning features. It consolidates diverse datasets and provides a unified platform for storing, discovering, and accessing features crucial for ML model training and inference. 

The Feast framework enables the seamless integration of features into ML pipelines, ensuring consistency and accuracy across different development lifecycle stages. Notably, a Feature Store enhances collaboration among data scientists and ML engineers by establishing a shared and standardized source of truth for features. 

It facilitates efficient experimentation, model development, and deployment by promoting versioning, monitoring, and governance of features. By centralizing and standardizing feature management, it contributes directly to improved model quality, reproducibility, and scalability.

A diagram showing use cases impacted by data initiative
Image 1: Use Cases Impacted by Data Initiative

Before & After

The two diagrams below (Image 2 and Image 3) illustrate the AS-IS scenario, representing the state before the implementations carried out by phData, and the TO-BE scenario, depicting the outcome after the project. An intriguing observation is the pivotal role played by the Feature Store, functioning as an intermediary layer between the ultimate data users and the datasets housed within Snowflake. This layer is encapsulated within the repositories that engage with the entities’ dataset, forming an abstraction that enables a standardized approach for fetching collections of data. 

This strategic abstraction mitigates code duplication and ensures uniform adherence to established business rules and data cleansing protocols across all applications, reinforcing a cohesive and streamlined data utilization framework within the company.

A diagram showing the AS-IS data architecture
Image 2: AS-IS Data Architecture
A diagram showing the TO-BE data architecture
Image 3: TO-BE Data Architecture (delivered)

It is crucial to highlight that the establishment of the Feature Store involves comprehensive discussions on data alignments, aiming to delineate and enhance existing data entities to be mapped within the Feast Framework. This phase presented a valuable opportunity for phData to collaborate with the client to define materialization strategies, effectively transforming raw data into refined and cleansed datasets in the Silver stage. 

The materialization process was executed through Snowflake tasks, accommodating constraints from the Customer’s DevOps team. Notably, phData addressed these challenges and furnished the client with illustrative examples showcasing the potential of alternative solutions like Snowflake dynamic tables. These examples served as formats of how to empower the data teams, accelerating their tasks and concurrently reducing maintenance costs — an exemplary contribution to enhancing operational efficiency.

Additionally, phData expressed a keen interest in ensuring the client’s team could seamlessly progress in the future by mapping new data entities and maximizing the benefits of the Feature Store implementation. 

To address this concern, phData not only produced documents regarding the implemented architecture but also delivered comprehensive hand-over training sessions (for the data professional from the customer side) and provided practical documentation elucidating the intricacies of interacting with the stack established during the project. 

A diagram showing the development workflow for the Feature Store integration
Image 4: Feature Store integration - Development Workflow

This illustrated development strategy was implemented by phData to collaboratively work within the customer’s environment, engaging and synchronizing all the details with the main stakeholders from each model that has been pointed as a candidate to be immediately integrated with the Feature Store stack. It also served as a driver for the local data teams to use in the future during inceptions from new ML Models.

phData also delved into exploring further functionalities offered by the Feast Framework to cater to additional use cases for the customer. One such feature highlighted was the Feast “saved-datasets,” facilitating teams in prototyping and sharing datasets.

The Final Solution

The accompanying diagram (Image 5) offers a comprehensive glimpse into the ultimate solution deployed within the customer’s environment by phData. The diagram not only delineates the overall system architecture but also elucidates the intricate workflow of CI/CD. 

Specifically, it highlights the sequential steps involved when new modifications are introduced to the Feast registry repository. phData orchestrated a robust CI/CD pipeline leveraging Github actions, automating the entire process of updating entity mappings within the Snowflake Registry database based on changes performed by developers over the GitHub Repository.

It is also noteworthy to emphasize that the workflow streamlines the technical processes and fosters a conducive environment for a well-structured software development lifecycle. This methodology promotes a systematic and reliable progression by ensuring that alterations undergo thorough development and testing phases across various environments before reaching the production stage. 

Incorporating proper versioning and release control mechanisms further enhances the collaborative efforts among data engineers & scientists within their community. This approach improves the efficiency of their tasks and ensures a cohesive and organized collaboration, facilitating a seamless integration of changes into the production environment.

A diagram showing the final solution design (considering CI/CD)
Image 5: Final Solution Design (considering CI/CD)

Take the next step
with phData.

Learn how phData can help solve your most challenging data analytics and machine learning problems.

Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.

Accelerate and automate your data projects with the phData Toolkit