Customer's Challenge
A top manufacturer of mining and earth-moving equipment sought to boost revenue with new offerings, including smart-connected equipment and post-purchase proactive maintenance services. That meant transforming their existing sensor-based analytics platform into a more efficient, centralized, IoT data solution. And that meant they needed help.
phData's Solution
phData designed a cloud-native IoT solution on Snowflake and Microsoft Azure, then helped migrate from Hadoop to validate production readiness. The manufacturer has transformed their small web app into a unified IoT data store, analytics, and visualization platform — all built around CI/CD and infrastructure-as-code to maximize the value of the cloud.
Results
The manufacturing corporation now has a proven path to further break down data silos and migrate more large applications from Hadoop to the modern Snowflake-based solution architected by phData. Before long, they’ll be able to empower customers using equipment across their entire product portfolio with all the improved efficiencies of an IoT data and analytics platform built from the ground up with phData’s Cloud 2.0 approach.
The Full Story
A leading manufacturer of earth-moving equipment, including construction, mining, and forestry equipment, has increasingly come to rely on sensor data to understand how their machines are performing.
Most machines they make — from excavators and front-end loaders to subsurface mining equipment and drills — include sensors to track a variety of indicators like hydraulic pressure, engine RPMs engine, oil temperature, and wheel speed. This Internet of Things (IoT) data not only allows them to predict when individual machines require maintenance, but also how they might help customers operate their products more efficiently (by analyzing operational cycles and patterns).
Because many of these machines may stay running 24×7 — and because such large machines often require similarly large outlays in capital — these insights provide enormous business value. This materialized by increasing top line revenue through new products and services, including smart-connected equipment and post-purchase proactive maintenance services.
After several previous iterations, the manufacturer had been using a Hadoop-based solution to process, store, and analyze all their sensor data. However, maintaining the platform required their small analytics team to spend more time administering the cluster than getting value from the data they collected, in addition, the static resource allocation model meant they could not scale dynamically and their compute costs were increasing.
As a result, they decided to explore how they might take advantage of the latest cloud-native services and data technologies to streamline systems management and improve efficiency, while simultaneously consolidating their siloed data sources.
Mountains of sensor data
To meet their goals and justify the costs of moving to a new platform, the manufacturer would need to design a modern, cloud-based data analytics solution; they also needed to ensure this solution could intake their existing data from Hadoop, and handle the high volume of new IoT data being pushed daily from their equipment.
Key Challenges
Designing and validating the right solution architecture
The manufacturer knew they wanted to move off of Hadoop and take advantage of cloud-native data technologies; however, they were less sure about which of those technologies were right for the job, how they should be optimized, and how they demonstrate the feasibility of the new solution.
Moving mountains of IoT data
With sensors generating thousands of data points per minute, per individual machine, the revamped solution would need to handle billions of sensor records per day. And with 40+TB of data to ingest, migration from their existing Hadoop-based solution was bound to be a complex challenge.
Unifying disparate systems and data
To handle the volume and heterogeneity of sensor data from all their different equipment (often transmitted from highly remote locations with poor internet connectivity), the manufacturer was parceling the data into files and uploading them once a minute. Accordingly, the new solution would need to incorporate their existing proprietary API, then somehow convert these files into a consistent and usable format. It would also need to serve as a central repository to help tear down corporate data silos and unify the multitude of existing systems of record.
Digging deeper with a Cloud 2.0 Architecture
The phData team worked closely with the manufacturer’s analytics team to understand both their existing Hadoop-based platform and their goals for overhauling it, then provided technology recommendations and support they needed to successfully transform it.
To deliver the required improvements in efficiency, maintainability, and data accessibility, phData designed a new architecture around Snowflake. They leveraged both the right mix of cloud-native services and data technologies (such as Spark and Kafka for data processing, and Microsoft Azure and Kubernetes for infrastructure and orchestration) as well as the right “Cloud 2.0” design and deployment practices (such as taking containerized, “infrastructure-as-code” approach to deploy the Kafka Connector using Azure Kubernetes Service) to make the most of those technologies. Finally, they proved the viability of the new solution by helping to successfully migrate one of the manufacturer’s large applications from Hadoop to Snowflake.
The data files generated once per minute by the sensors are now uploaded to Azure blob storage via a proprietary REST API; then, these hundreds of millions of small files are processed and normalized by Spark before being transmitted to Snowflake via the Spark-Snowflake connector.
Once in Snowflake, the data is consolidated and enhanced, every two minutes, via a series of tables and schemas designed to flatten data structures and introduce new data columns that provide more ways to break down the data.
The final result? A common data warehouse that’s easily accessible via Power BI dashboards.
Striking paydirt with IoT on Snowflake
Thanks to the solution architecture design and migration support from phData, the manufacturing corporation has transformed what started out as a small Microsoft SQL Server-based web application into a unified IoT data store, analytics, and visualization platform — one with the potential to now support the entire business:
- A proven foundation for IoT — By successfully executing a large Hadoop-to-Snowflake migration, phData helped prove the value and viability of the new Snowflake-based solution; as a result, the platform is already seeing surging adoption across more and more equipment types and product lines, and has garnered additional funding from corporate leadership.
- 8 billion IoT data points daily — The solution processes billions of sensor records on a daily basis, coming in from mining, construction, and other industrial equipment all around the world:
- Dynamic Scale Using Cloud — Snowflake makes it easy to “right-size” warehouses to the use case at hand. For example, to write all 8-10 billion daily sensor records to a persistent table, they can spin up a single 4X-Large warehouse to complete the job in minutes, for the same cost it would take using much slower smaller clusters). Also, by making the most of cloud infrastructure technologies like Microsoft Azure and Azure Kubernetes Service, the solution maximizes utilization and keeps costs at a minimum.
- More unified data, more collaboration — Data sets previously stored across multiple silos are now consolidated in Snowflake, with new data being added on a regular basis; this makes it much easier to share intelligence between groups, organizations, and customers.
- Simplified security and access control —Thanks to Snowflake’s Azure AD integration, the manufacturer can extend user access and manage identities and permissions with ease, in a secure fashion, with Single Sign On (SSO) for internal teams and external customers alike.
Take the next step
with phData.
Learn how phData can help solve your most challenging data analytics and machine learning problems.