Customer's Challenge
A large U.S. Outdoor Vehicle Manufacturer was struggling to accurately forecast demand across hundreds of thousands of unique products, many of which lacked historical sales data. To account for complex seasonal factors and sporadic variability, they needed far more than simplistic data extrapolation. They needed modern machine learning.
phData's Solution
In just three months, phData’s two-person team delivered a modern, end-to-end ML solution that can account for complex variables and dependencies. Now for the first time, the manufacturer can forecast demand with high confidence (even for new products with zero historical sales data) to better support product launches and minimize stocking costs.
The Full Story
After a string of recent acquisitions, a top Outdoor Vehicle Manufacturer (10,000+ employees, revenue of $5+ billion) had rapidly expanded their product catalogue — not just with new vehicles, but with the thousands of parts required to service them as well.
The number of SKUs in their inventory has ballooned with each new acquisition. It wasn’t long before they found themselves with 700,000 unique SKUs for various vehicle parts; and because they guarantee that replacement parts for any given vehicle will remain available for 10 years, the vast majority of those items aren’t going anywhere any time soon.
Naturally, managing such a massive catalogue presents a slew of challenges. One of the largest, as the company found out, was how difficult it became to accurately forecast demand across hundreds of thousands of unique products — particularly when accounting for seasonal or regional factors.
They realized that machine learning (ML) might help supplement their existing forecasting system (which was built on simple extrapolations of historical sales data for a given part) — but weren’t sure exactly how, or where they should start. And so they turned to phData.
Taking the plunge with machine learning
The two-person phData ML team worked closely with the manufacturer to understand their existing forecasting system, identify potential opportunities, and explore how ML might help them meet their key business goals — namely, improving their ability to predict demand and minimize the costs associated with purchasing and storing parts that turn out to be superfluous.
The next step was to review their options, establish the right goals for machine learning, and draw up a workable plan to get there.
The manufacturer’s existing forecasting system could generally provide adequate predictions for spare parts associated with the older vehicles in their catalogue; however, these predictions suffered from notable inconsistencies — especially for newer parts with less historical data to extrapolate from.
Forecasting difficulties included:
-
Sporadic, “on-and-off” nature of demand for spare parts
(i.e. large % of “zero” values) -
Seasonal variation in demand across vehicle types
(e.g., snowmobiles vs. fishing boats) - Tendency of historical forecasting to react to old trends that are no longer indicative of future demand
- Variation in the proportion of identical parts shared between different vehicle models
The teams using the system lacked visibility into how the forecasts actually worked and what specific input data was underpinning demand projections; it was nearly impossible to actually diagnose what was causing these problems, much less figure out how to solve them. As a result, the customer relied on a small team to create manual forecasts. Due to the size of the catalog, manual forecasting was time consuming and could only be done for a small percentage of the total items.
Meanwhile, because the customer lacked experience with more sophisticated ML models and algorithms, they weren’t sure what might or might not be possible with a more modern forecasting solution. For example, because their existing system was based solely on historical patterns (requiring at least 3 months of existing sales data), they hadn’t considered that forecasting demand tied to new vehicle launches was even a real possibility.
The price of poor forecasting
- Understocking = risking customers, today and tomorrow — Vehicle owners need the right parts at the right time. Having to wait up to 6 months for out-of-stock parts leads to unhappy customers — undermining brand loyalty and future sales.
- But overstocking = throwing money down the drain — Every part you make that goes unsold is an advantage wasted. And not only that: the cost of storing and tracking unused parts can cost just as much — especially if, as this customer did, you’re at risk of having to open new warehouses just to stock more and more parts you’ll never use.
Unearthing novel opportunities
Our ML team walked the manufacturing company through their own process, helping them understand where their biggest pain points really were and what they would need to do to address them. From there, we designed a modern solution for more powerful, reliable demand forecasting. We leveraged Spark to manage the full data engineering and ML lifecycle, and we deployed it on Microsoft Azure.
And we didn’t stop there. While working to overhaul the customer’s existing forecasting, phData ML engineers identified their inability to forecast demand for new vehicle launches as a major gap — one the customer hadn’t realized it was even possible to solve for.
It posed a tricky data science problem. A successful model would need to deliver accurate predictions with zero baseline of historical sales data to extrapolate from — associating new parts with existing parts while correcting historical biases and accounting for the complex web of variables and interdependencies described above.
With the right ML expertise, however, a solution was both possible and eminently feasible — with the potential to deliver massive results.
Inventing novel solutions
Our team realized early on that the problem of forecasting demands for net-new items couldn’t be solved with a single algorithm. For example, using a single clustering algorithm to group similar parts or vehicles together yielded demand trendlines that weren’t predictive, or placed demand at entirely the wrong scale.
To account for the complexities at play, we designed two distinct ML models — one for predicting the average level of demand factors and another for predicting the level of variance across those factors — combining advanced algorithms such as XGBboost, RandomForest, KNN, and LSH.
As a result, the customer now has the ability to forecast demand for new products with high confidence. When testing our new solution using forecasts across 915 items, 82 percent had a Mean Absolute Error of < 0.3.
As the new solution rolls into production, the customer now has the ability to forecast their entire SKU catalog for the very first time — all thanks to not having to rely on manually produced forecasts.
Scale horizontally – Moreover, the algorithms are implemented with pySpark, which enables us to scale horizontally (we can process high volumes of data at very high speed).
Because our ML teams are highly interdisciplinary — bringing both data science skills and data engineering knowhow — we helped ensure the customer’s solution would grow with the customer’s needs. Implementing the algorithms with pySpark enables the solution to scale horizontally, with the ability to process high volumes of data at very high speed.
And best of all? Our two-person ML team managed all this in record time — just three months!
On the art of algorithm arrangement:
"A real solution to a real problem is never just a single algorithm you can pick up from a library in Spark.
The art is in how you choose and combine all the right algorithms and models for the job."— phData ML Engineer
Powerful new predictive powers, in a matter of months
The modern forecasting and ML solution delivered by phData goes beyond improving the company’s existing demand forecasting — unlocking all-new predictive capabilities to better support product launches and further minimize stocking costs.
Business Outcomes
- Tangible value in three short months​ — phData’s ML experts managed to design and build a full end-to-end solution on Spark, uncover a huge green-field opportunity for the customer, and deliver broader predictive capabilities with improved accuracy and consistency — all in just three months.
- Better forecasting, lower costs, improved customer service — With all-new forecasting capabilities come all-new savings in stocking costs. The customer now has everything they need to deliver customers the right part at the right time — including for newly launched vehicles that lack historical sales data.
- A platform for future success — The company now has a sustainable platform to implement the infrastructure and processes they need to scale, optimize, and maintain production-ready ML applications according to Spark and Azure best practices.
The next phase will include additional refinement to the model, as well as work to automate the customer’s ML pipelines using DevOps processes and toolchain components. This is expected to further improve forecasts as well as setting a foundation for easily productionalizing new ML pipelines in other parts of the business.
But already the payoff has been huge for the customer — improving their ability to predict demand for products new and old, so they can get customers the parts they need, without running up costs stocking warehouses full of superfluous parts.
Take the next step
with phData.
Learn how phData can help solve your most challenging data analytics and machine learning problems.