Customer's Challenge
A top U.S. agribusiness and dairy company found themselves heavily dependent on an Excel-based forecasting simulation to optimize the profit margins on their product. However, maintaining and scaling such a complex tool built within a single, massive spreadsheet had become infeasible — putting their entire business model at risk.
phData's Solution
phData helped rewrite the convoluted spreadsheet-based Monte Carlo simulation as a simple, easy-to-use Python application — thoroughly validating the reconstructed solution, enabling all-new forecasting capabilities, and dramatically improving performance and automatability — so the agribusiness can sustainably maximize margins and growth.
The Full Story
For a leading U.S. agribusiness and dairy company ($15BN+), the ability to make data-driven decisions provides a critical survival advantage. Their business model depends on squeezing the maximum-possible profit out of every drop of milk their members produce. Now, with growing competition from alternative dairy products, the importance of optimizing their tight profit margins is greater than ever.
To determine how much product should be contracted for sale as milk vs. retained for sale as byproducts like yogurt or powdered milk, the agribusiness company had been relying on a Monte Carlo simulation tool. However, the simulation — which was, by necessity, extremely complex, weighing an enormous range of inputs, potential risks, and degrees of randomness — was built entirely in Excel, as a single, gargantuan spreadsheet.
Not only was it difficult to use; it was a nightmare to maintain. The agribusiness company knew they needed a more sustainable, performant, and scalable solution to help them sell their product more profitably.
Milking the product for all it’s worth
Overhauling such an involved simulation was clearly going to require a herculean effort — especially given the fundamental complexities of the business problem at-hand.
The simulation’s purpose was to predict just the right volume of milk that the agribusiness company should contract to sell in order to maximize their overall profit margins. Contract too much milk, and they’d be left with insufficient stock to produce their most profitable byproducts; but contract too little, and they’d have to offload more volume into increasingly less-profitable byproducts — or in a worst-case scenario, to dump unsold milk.
To find that sweet spot, the simulation has to account for an enormous spectrum of known and unknown factors. These include:
- How much milk the agribusiness company needs to sell
- The demand, price, and profit margins for milk
- The demand, price and profit margins across the hierarchy of byproducts the remaining milk could be sold as
- The capacity and efficiency of the dryers that produce those byproducts
- Seasonal variability in milk production and demand
And that’s just scratching the surface.
Naturally, a simulation with such a large range of dependent inputs and variables, was incredibly difficult to manage as a spreadsheet:
- Endless hidden dependencies — In Excel, the dependencies between different variables in the simulation are encapsulated in individual cells. Because there’s no easy way to see which columns are dependent on which, it’s extremely hard to track how changes to one part of the model are impacting other parts.
- Poor visibility and change tracking — In Excel, it’s extremely easy to lose track of what changes were made to what data. And because minor changes to the input assumptions can dramatically change the simulation, it can quickly become difficult to trust in the forecast.
- Infeasibility of extended forecasting — Extending the forecasting time-frame from 1 year to 3 years would have major business advantages, but would more than triple the difficulty of maintaining and executing the solution, with 3X the number of hidden dependencies and columns to track, as well as the added data bogging down Excel.
- No path to production — It’s very difficult to automate a spread-sheet based simulation like Oracle Crystal Ball. And even if you did, it would be nigh-impossible to put it into production: running this simulation through 1,000 cycles in Excel would generate 1,000 pop-ups windows hitting the backend server.
- Suboptimal performance — Because Excel isn’t designed to utilize 100% capacity as you add more CPU cores, running the simulation can take significantly longer than if it were built on a more powerful programming language built to scale horizontally.
Knowhow and grit: a recipe for success
The agribusiness company’s technical leaders ultimately decided to rebuild their unwieldy, spreadsheet-based Monte Carlo simulation as a simple, easy-to-use Python application; however, their own data science team was already overloaded and it takes them a considerable time to hire new talent. So, looking for a partner they could trust to tackle such a massive, complex project — and to tackle it fast — they turned to the phData Data Science team.
And the team delivered: from hypothesis analysis to input simplification and solution validation, they rewrote and implemented the model in Python, then worked tirelessly to ensure the reconstructed simulation performed as expected.
To accurately reconstruct the solution, the phData team had to dive deep into the original Excel-based solution — churning through countless columns and individual cells to account for all the hidden dependencies, all the undocumented changes, and all the many inconsistencies in data structure, even within individual columns — as well as continually revising their Python code as they wrote it to keep up with a business that was rapidly evolving while the project was still in-flight.
However, because they came equipped with both the data science expertise to understand what made the original simulation tick and the grit and dedication to help the customer succeed — at one point, during the validation phase, spending an entire weekend to track down a single discrepancy amidst the thousands of spreadsheet cells — the phData team got the job done, and got it done fast.
The benefits of a job well-done
With their new-and-improved Monte Carlo simulation — lately reborn as a much easier-to-maintain Python application — the agribusiness company now has a much more sustainable foundation to help them maximize the overall profit margins on their product.
Business Outcomes:
- A reliable, maintainable solution — Unlike with a spreadsheet, it’s a snap to look at the new solution’s python code and understand the many different dependencies. And with proper source code management, versioning and change tracking will make it far easier to keep track of tweaks to the complex set of inputs informing the simulation.
- Extended forecasting, improved decision making — The company is now able to feasibly extend the forecasting simulation from one year to three without tripling the difficulty of managing the simulation. This not only only allows them to plan their contracted volume of milk sales farther in advance; it lays the groundwork to accumulate the historical data they need to create a regression and evolve from a simulation into a true machine-learning forecasting solution.
- Automatability and production readiness — As a Python-based application, the solution is now far more easy to automate, and far more capable, ultimately, of being eventually deployed into production, if required.
- Optimized performance for faster insights — The agribusiness company can now scale its simulation more efficiently with massively parallel processing. Whereas the spreadsheet simulation had taken 3 hours to run 12 volumes (with each volume being 1,000 iterations of the simulation), the new version can simulate nearly twice the number of volumes in half the time (21 volumes in 1.5 hours) using 16 CPUs at 100%, with the potential to add any number of additional CPUs for additional improvements.
Take the next step
with phData.
Learn how phData can help solve your most challenging data analytics and machine learning problems.