Customer's Challenge
A top-5 U.S. restaurant chain turned to machine learning (ML) to continue delivering exceptional quality and service amid rapid growth. But to get hundreds of models into production using AWS and Sagemaker, they realized they needed a more automated framework to streamline the ML pipelines they were counting on to scale up food and service quality.
phData's Solution
phData created an opinionated workflow or “assembly line” for training, deploying, updating, and securing AWS ML models. Now the company can get ML models into production faster, more efficiently, and with less risk — empowering them to keep delivering superior food quality and service at an increasingly massive scale.
Results
The customer now has a foundation not only for their computer vision models, but for all the ML initiatives they’re depending on to scale their brand promise of superior food quality and customer experience. They’re hitting the ground running, working toward an ambitious goal of deploying 400+ forecasting models throughout 2020.
And they’re excited to continue working with phData — harnessing data and machine learning to power a range of innovative use cases: from using speech processing to automate drive thru orders to micro-forecasting demand (accounting for complex factors like weather and traffic) to accurately predict how many baskets of fries should be cooking to improve drive-thru efficiency.
The Full Story
A top U.S. restaurant chain, already doing $10+ billion in annual sales, knew that to sustain their aggressive growth trajectory, they needed to continue making good on their core brand promise: delivering exceptional quality and customer service at a competitive price. They weren’t historically a tech-focused company. But to ensure consistency across many hundreds of restaurant locations, they needed to become one.
They launched several machine learning (ML) projects designed to maintain their differentiation across many hundreds of locations, powered by AWS and ML technologies. Among them was a set of many computer vision models that ultimately became key to how they ensure food quality (e.g., what does a “good” sandwich look like versus a “bad” one ), as well as prototype models that will be key to improving order speed and accuracy at drive-thrus as they continue to grow.
However, it wasn’t long before the company understood that machine learning, like the restaurant business, is much more complex to do successfully as you reach an increasingly massive scale.
The ML learning curve
Originally, each ML model was hand-built in Jupyter on a laptop, deployed in a way that requires manual runs, and maintenance was completely hands-on. Because the company was, in effect, constantly reinventing the wheel, it took significant time and developer resources to get each new model into production. And for their computer vision use case — in order to recognize the many different items on their menu — they needed to train and deploy a lot of new models.
ML models were being cobbled together in random notebooks, Python programs and R scripts — without processes or controls to maintain a central repository and version controls. They had no way of tracking the life of a model from training through to production. And there was no central solution for monitoring the performance and quality of ML models over time.
The lack of these best practices contributed to serious inefficiencies and risks:
- Longer time-to-value — Developers had to dig through random notes and lines of code to piece together what they needed, then manually train each new menu item.
- Increased costs — In addition to the high costs of developer time, hand-built models yielded higher infrastructure costs due to over-allocation, making it hard to feasibly try out new ideas such as drive-thru speech processing for order automation.
- Higher security risk — Each hand-built pipeline was one more chance to mistakenly make an S3 bucket public or create an over-permissive role.
- Technical debt — Models put into production without a monitoring solution or a traceable lineage contributed to significant technical debt and a strong likelihood of solution failure in production.
Ultimately, the restaurant company saw that to deliver quality food and service at-scale, they also needed to deliver technology at-scale. But making machine learning efficient and automated was easier said than done.
Quality on the line
Solving these problems meant building a system of standards, processes, and automated workflows robust enough to get ML models into production and ensure availability — a challenge even for seasoned data scientists.
The restaurant company had been supplementing their team with university students who had talent, but not necessarily experience. And due to the complex set of variables unique to the restaurant industry — for example, the linguistic quirks of patrons ordering at the drive thru — using a hodgepodge of canned, off-the-shelf solutions wasn’t an option. Realizing they needed proven data and ML experts, they ultimately decided to partner with phData.
The phData ML team worked to understand the company’s requirements, then created an opinionated workflow or “assembly line” — based on a set of standard infrastructure including Airflow, AWS Sagemaker, and AWS Batch. The team combined this infrastructure foundation with information architecture, process, automation, and best practices — for getting ML models trained, deployed, updated, and secured.
As a result:
- New models can be quickly plugged in and deployed using AWS CodeBuild and Jenkins.
- Infrastructure and security is standardized using AWS CloudFormation, ensuring that every model is built and deployed according to best practices.
- Version control and code centralization with git save time and reduce error.
- Visibility exists in AWS SageMaker across the entire flow of turning training data into a production model.
- A monitoring solution is in place, using AWS CloudWatch, to provide visibility into production model performance and avoid drift.
phData was able to integrate the new ML components and workflow seamlessly with the existing ecosystem of cloud and DevOps tools that the company was already using, like Jenkins and AWS CloudFormation, to manage infrastructure operations.
Scaling technology to scale quality customer experience
The new automated workflow the company created with phData has streamlined their ability to get ML models into production faster, more efficiently, and with less risk:
- More value, faster — By transforming a highly manual process into an automated assembly line for ML models, with standardized, infrastructure automation and templatized workflows, teams can simply take new training data, drop them into storage, and run the pipeline to retrain and redeploy.
- Lower costs, better ability to innovate — Standardized infrastructure minimizes overallocation and waste. With a more feasible cost structure, developers who no longer have to build and deploy each model by hand can focus on solving new business problems.
- Improved reliability and risk mitigation — More automation means less human error. Infrastructure-as-code, templatized workflows, version control, and code centralization all reduce the risk of both cybersecurity and errors that impact availability and performance.
- Minimized technical debt — Centralized monitoring and alerting, combined with highly visible lineage for each model, drastically reduced technical debt and increased the quality of the solutions in production over time.
Take the next step
with phData.
Learn how phData can help solve your most challenging data analytics and machine learning problems.