This guide was co-written by Babak Tabesh and Allyse Fristo.
Organizations are undergoing data transformations and modernization efforts to remain competitive but few can effectively build and sustain the platform of their vision. Building and operating a data platform at an exceptional level is not perceived as a competitive differentiator for most businesses and therefore, not prioritized appropriately. The most common culprit preventing data transformation is the lack of an operational roadmap aligned with the strategy that is implemented from the onset.
Organizations typically prioritize data operations only after experiencing a string of high-profile incidents. Abruptly incorporated change management controls often overwhelm the platform, ultimately degrading organizational trust, which is difficult to earn and easily lost.
phData has created an Operational Maturity Framework (OMF) that provides a comprehensive, prioritized roadmap for leveraging cloud technology to drive business value. Here we will provide a clear guide of the components required to develop a high-performing data platform.
The Operational Maturity Framework (OMF) leverages phData’s experience and best practices developed across hundreds of customers that help businesses accelerate the adoption and scalability of their data platform.
1. In collaboration with the client, phData’s Elastic Operations (EPO) team carries out the requisite information gathering to comprehensively evaluate the current state of the data platform.
2. The team scores the components of the data platform against industry best practices criteria using a combination of hands-on investigation and rubrics.
3. The OMF Scorecard is presented and reviewed with the client to align on the proposed prioritized roadmap.
To demonstrate the effectiveness of the operational maturity model, we can examine a real-life scenario where a large financial institution struggled to scale its modern data platform. By implementing the operational maturity model, the institution was able to identify and address gaps in its operations, leading to significant improvements in its platform’s performance and scalability.
The OMF defines four operations pillars:
Strategy – Aligns your data platform operations team to deliver on core business objectives and accelerate your data transformation initiatives.
Center of Excellence (CoE)Â – Codifies how you build and execute your operations strategy across the organization. This pillar focuses on establishing and promoting best practices. It aims to provide thought leadership and direction on new technologies and facilitate platform optimization through automation and education.
Core Operations – Focuses on the execution of processes necessary to manage any data platform.
Team – Defines the roles/responsibilities, skill sets, knowledge management, and personnel coverage required to operate and scale a modern data platform.
These four pillars are foundational for building a best-in-class data platform within an organization.
The strategy pillar consists of seven components that are required to align data platform objects to the organizational business needs. It’s important to note that the common stakeholders include the C-Suite Leadership team. Transformations are most impactful when the organization decides to change, top to bottom, the way they use data. Branded transformations help focus and motivate the organization at scale and facilitate the realization of business value as part of the transformation.
A critical step in the success of your data platform is clearly outlining the purpose and goals of the platform itself. Defining objectives and how they will be measured provides a standard for reference across all stages of platform development.
The objectives facilitate the value derived from the platform, which is provided in four important ways: increased revenue through new data products, increased efficiency of staff, decreased technology costs, and increased speed to market/business agility. Understanding how a data platform creates and provides business value and actively measuring that value is fundamental to long-term platform success.
Evaluate organization alignment with platform objectives to ensure data platform operations support business goals. If there are discrepancies, evaluate the root cause and establish a plan to remediate. Assess whether your organization reflects the desired behaviors, roles, and culture that you have determined are key elements to your success.
Many enterprises are made up of various business units. In a federated model, the business units have minimal reliance on IT to manage data. The business unit’s data and analytics function is responsible for reporting on behalf of the business domain.
In a centralized model, a single central organization controls and manages all the data and analytics for the enterprise. The distinction greatly influences how organizations adopt and scale data & analytics capabilities and technologies.
The level of risk that your organization is willing or able to accept should be understood, documented, and communicated at an organizational level. This helps minimize the potential for compliance failures and allows for metrics to be tracked and reported to validate adherence. It also helps right-size the work in areas where value may not be perceived by the greater organization.
Costs associated with operating a modern data platform include staffing costs, technology costs, and supplier costs. This also covers the reduction of the Total Cost of Ownership (TCO) and liquidation of capital assets by downsizing the IT infrastructure, which is a key element of realizing the benefits of cloud transformation.
Additionally, continuously tracking and reporting costs allows for a true understanding of the value created by the digital products and subsequent business decisions on whether it is worth investing in development and maintenance.
Any organization that works with an evolving set of technologies must itself be designed to adapt to changing circumstances. Establishing flexibility as a desired goal of the platform operations model facilitates internal agility to meet new or changing business requirements. This is key to mitigating locked-in situations and driving long-term success.
The Analytics Maturity framework is a model that helps assess the current level of analytical capability of each business unit and aids in identifying areas of improvement. The Analytics maturity model is used to evaluate the platform’s ability to collect, store, and analyze data, as well as optimization through more advanced analytical techniques such as predictive modeling and machine learning.
The model typically includes several stages of maturity, ranging from descriptive to prescriptive analytics, with each stage building upon the capabilities of the previous one. The goal of this component is to identify areas where the analytics capabilities can be enhanced to drive incremental business value.
This pillar is composed of eight components that are required to codify how operations strategies are best built and executed across the organization.
Establish and maintain guidelines, principles, patterns, and guardrails for the data platform. Create consensus within the organization for enterprise standards that will drive platform adoption. By defining processes and procedures, each new data product will not have to reinvent proper authentication, security, networking, and logging and monitoring implementations.
Exercise authority and control over data to meet all platform stakeholder expectations. Business processes and analytics capabilities depend on accurate, complete, timely, and relevant data. Define and assign key roles, including data owners, stewards, and custodians.
Consider adopting a federated data mesh approach to data governance. This allows business units the autonomy to execute standards for their particular environment by making data more accessible and available across the user base. To do so, specify standards, including data dictionaries, taxonomies, and business glossaries at a central level. Identify what datasets need to be referenced and model the
relationships between reference data entities.
Best-in-class Data Governance is a combination of people, process, and technology, and for that reason, it expands across all four pillars. The core components of a data governance program include:
Automation increases efficiency, reduces operational overhead, and lowers the risk of manual error, freeing up resources that can be redirected toward product development. Automating repeated tasks such as user and resource provisioning, application deployments, firewall changes, and monitoring/alerting configurations, organizations can devote more time and attention to their business goals.
Some examples of common automation features include elastic scaling capacity to ensure resources are properly allocated for workloads, automation of security practices, and building out systems and objects via automated scripts.
The platform’s environment, data, and technology landscape are in a constant state of flux. Inevitably, there will be a need to debug, develop feature enhancements, incorporate process engineering, and re-engineer various aspects of the platform.
Proper planning before changes to upstream data sources will reduce and/or eliminate the impact on users and dependent systems. To successfully adopt and scale modern data platforms, it is critical to have accounted for and deliver these improvements.
Continuously monitor, evaluate, manage, and improve the effectiveness of your security and privacy programs. The organization, and the customers you serve, need trust and confidence that the controls you have implemented will enable you to meet regulatory requirements and effectively manage security and privacy risks in line with your business objectives and established risk tolerance.
Develop, maintain, and effectively communicate security roles, responsibilities, policies, processes, and procedures. Ensuring clear lines of accountability is critical to the effectiveness of your security program. Taking inventory of your data assets, security risks, and compliance requirements that apply to your industry and/or organization will help prioritize security efforts.
Adhering to best practices allows organizations to leverage lessons learned and avoid common pitfalls. Providing technical leadership and coaching on these best practices is critical to the adoption and scalability of modern data technologies and platforms. Examples include guidance on core reference architectures, application architecture, and user and developer guidance on how to achieve desired outcomes.
Create a getting started guide for different personas that will be platform stakeholders and develop a communication channel for sharing best practices. Provide timely updates on platform enhancements and availability. Creation of a runbook also can be an excellent way of providing an easily accessible reference document for visibility into platform processes, both for operations teams as well as for stakeholder awareness. These may be broad runbooks that capture general platform operations or more specific runbooks to provide
insight into a specific domain.
Continuous iteration over areas for platform improvements improves user experience and reduces friction for achieving the platform’s objectives. This optimizes overall platform performance, lowers cost, and increases the business value derived from the system.
The nine components of Core Operations are designed to future-proof the data platform operations by developing robust and scalable processes.
When operating at cloud speed and scale, it is important to identify problems as they arise, ideally before they disrupt the customer experience. Develop the telemetry (logs, metrics, and traces) necessary to understand the internal state and health of your workloads. Proactively monitor application endpoints, assess the impact on the end-users, and generate alerts when measurements exceed
thresholds.
Having clear paths defined for user onboarding and audits allows for increased adoption and scalability and the capacity to comply with security and platform governance requirements. Automating both of these foundational capabilities greatly improves the potential for scalability.
By connecting the identity source and granting users the necessary permissions, appropriate users can efficiently and securely sign in, access, and provision resources. This ensures that the right people and machines can access the right resources under the right conditions.
The ability to identify, debug, and restore service as quickly as possible provides trust in the platform. Even more important is ensuring that communication throughout any incident or problem is timely, clear, and gives stakeholders confidence in the resolution process.
It is essential to then follow up with a root cause analysis to assess causal factors that are contributing to recurring problems. The next level of platform maturity is to introduce monitoring to notify in advance of the issue occurring and to become increasingly proactive in predicting similar incidents using trend analysis.
With cloud adoption, processes for responding to service and application health issues can be highly automated, resulting in greater service uptime. Automate responses to events to reduce errors caused by manual processes and ensure prompt and consistent responses.
Maintaining consistent infrastructure provisioning in a scalable and repeatable manner becomes more complex as the organization grows. Streamlined provisioning and orchestration will help achieve consistent governance to meet compliance requirements. Adopt best practices for Infrastructure as Code (IaC) to execute faster, improve quality, and drive down costs.
Evolve and improve applications and services at a faster pace than organizations using traditional software development and infrastructure management processes with the use of CI/CD. Adopting DevOps practices with continuous integration, testing, and deployment will help you to become more agile, allowing you to innovate faster, adapt to changing markets better, and become more efficient at delivering business objectives.
Timely restoration after disasters and security events helps you maintain system availability and business continuity. Building cloud-enabled backup solutions require careful consideration of existing technology investments, recovery objectives, and available resources.
Develop a disaster recovery plan as a subset of your business continuity plan. Identify the threat, risk, impact, and cost of different disaster scenarios for each workload and specify Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) accordingly. Implement your chosen disaster recovery strategy leveraging multi-AZ or multi-region architecture. Review and test your plans regularly and adjust
your approach based on lessons learned.
Introduce and modify infrastructure and applications while minimizing the risk to production environments.
Establish change processes that allow for automated approval workflows that align with the agility of modern technologies. Use deployment management systems to track and implement changes.
Use frequent, small, and reversible changes to reduce the scope of a change. Test changes and validate the results at all lifecycle stages to minimize the risk and impact of failed deployments. Automate rollback to previously known good state when outcomes are not achieved to minimize recovery time and reduce errors caused by manual processes.
Providing prompt responses to requests, issues, and questions from users is critical to the success of any data platform. Business users, Developers, InfoSec teams, Executives, and other users will interact and have different responsibilities as it relates to the platform; each will have various needs and requests.
The support team must triage these requests, prioritize, address, delegate, and/or escalate to the appropriate team. Having the ability to notify the user community of changes, feature enhancements, outages, and maintenance to the data platform is crucial to maintaining user trust and driving adoption of the platform.
Systematically distribute and apply software system and service updates. Software updates address emerging security vulnerabilities, fix bugs, and introduce new features with minimal interruptions to business activities. A systematic approach to patch management will ensure that you benefit from the latest updates while minimizing risks to production environments.
Apply important updates during your specified maintenance window and critical security updates as soon as possible. Notify users in advance with the details of the upcoming updates and allow them to defer patches when other mitigating controls are available.
The Team pillar focuses on five critical components that help businesses create and sustain a data-driven culture.
Build digital acumen to confidently and effectively leverage cloud technologies to accelerate business outcomes. The requirement for an exceptional workforce goes beyond adapting to a digital environment. The greatest challenge is not the technology itself but rather the ability to hire, develop, retain, and motivate a talented, knowledgeable, proficient, and high-performing workforce that can evolve alongside the technology.
Knowledge of business processes or technical expertise is often siloed, specific to one small team or even a single individual. At best, this creates significant inefficiencies; at worst, a single point of failure. Distribution of knowledge facilitates the upskilling of the broader team, as well as improved visibility and traceability for business necessary processes and expertise.
Implement a means of documenting and disseminating information that is readily adhered to by the entire operations team. Design the knowledge management process to be regularly updated with evolving information and process changes.
The different stages in platform development require varied expertise and involvement from the operations team. Scaling here pertains both to scaled capacity but also to scaled expertise for niche situations that mandate advanced skill sets. Having a team that can readily scale and remain agile to the evolving requirements is key to platform success throughout its lifecycle.
Having a clear separation of duties and a common definition of roles allows for more efficient operations and higher-performing teams. Clarity of ownership drives increased accountability and helps to avoid the common pitfalls associated with ambiguous scope. This ensures that all aspects of a best-in-class operations team are accounted for and delivered on.
Having clear metrics and processes for understanding the capacity of the team creates better morale, increased retention, and more efficient operations. Identifying the off-hours support model reduces people working weekends or evenings, which often leads to burnout. Proper escalation paths for incidents and issues that arise along with L1, L2, L3 coverage are markers of mature platform operations.
Defining your model for 24×7 support, after-hours escalations, and coverage for when team members are sick or on vacation ensures continuity of platform health and averts otherwise avoidable issues.
Rapid advances in technological innovation continue to accelerate the need for continuous data modernization resulting in compounding expectations from IT organizations. The OMF is developed to help organizations leverage phData’s experience and best practices to accelerate their platform’s adoption and scalability. A comprehensive evaluation of the platform’s strategy, center of excellence, core operations, and team provides clarity of direction and prioritization to ensure your organization can get the most return on time and resources.
phData’s Elastic Platform Operations team uses this framework to continually assess each pillar and its respective components. Through a process of discovery, the team scores each critical area to identify and prioritize the gaps with the platform’s maturity.
At phData, we know this work is foundational to your success and we are here to help. Drawing from years of experience, learning, iterating, and doing this for customers at every stage of their life cycle, we’ve built a set of tools, processes, reference architectures, and a team that is ready to help you get on the right path towards better-utilizing data and analytics within your organization(s).
For a comprehensive evaluation of your data platform operations and a prioritized roadmap to jump-start your operational maturity, reach out to your phData AE.
Subscribe to our newsletter
Data Coach is our premium analytics training program with one-on-one coaching from renowned experts.