Resilient-driven and compliant-ready Site Reliability Engineering (SRE)

The Problem:

DORA requires financial entities and their critical ICT third-party service providers (including cloud and software vendors) to ensure operational resilience. This directly impacts SRE teams by:

Risk Management: DORA mandates a comprehensive ICT risk management framework. SRE's focus on identifying and mitigating risks—from system dependencies to single points of failure—becomes a core part of a company's regulatory compliance.

Incident Response & Reporting: The regulation demands a clear, well-documented incident management process. SRE's expertise in handling incidents, conducting blameless post-mortems, and improving systems based on findings is now subject to formal reporting to regulators and financial institutions.

Resilience Testing: DORA requires regular, comprehensive digital operational resilience testing, including advanced threat-led penetration testing. SRE's practice of chaos engineering, disaster recovery testing, and stress testing are no longer optional best practices; they are a regulatory obligation.

Third-Party Risk: The regulation places significant responsibility on financial entities for the resilience of their ICT service providers. This forces SRE teams at these tech companies to provide detailed, demonstrable proof of their operational resilience to their clients.

The Opportunity:

Digital Operational Resilience Act (DORA) elevates Site Reliability Engineering (SRE) from a technical discipline to a strategic and regulatory-critical function. DORA mandates and formalizes many practices that are central to SRE, forcing companies to adopt a more rigorous and auditable approach to reliability.

Why Me:

Over the past 5 years, I was involved in:

  • building a highly successful Cloud Strategy for one of the biggest Insurer in the world

  • building a highly profitable PaaS for 4 different entities from the group

  • advising the Group biggest program on setting up the Platform Engineering and selling pre-cooked domains (eg. Observability. CICD, Operational Model, SRE)

  • building a group product: Serverless Solution as a Service. The first serverless solution in the group.

  • building a highly successful and celebrated Site Reliability Engineering (SRE) transformation within a ScaleUp

  • redesign of the Incident Management into a Business driven process

Having personally designed, implemented, internally-sold and advised the Group Organisation on best SRE practices, I deeply understand the technical and cultural challenges of SRE and the operational realities of a highly regulated environment.

Who is the service addressed to:

I offer my expertise to large organisations investing in their SRE capabilities or any startups building SRE solution.

What is included in the service:

I will guide your organisation in evolving your SRE capabilities beyond mere technical alerts towards true business intelligence and enterprise-grade operational excellence by:

Strategising the SRE role: 'll guide your organisation in moving beyond a reactive, technical-first approach to a proactive, regulatory-driven mission. DORA fundamentally changes the SRE mindset from a "best-effort" discipline to a business-critical function. My expertise will help your SRE teams build and prove resilience to meet strict, enforceable standards, turning compliance into a competitive advantage.

Formalise the SRE practice : To meet new regulatory demands, a formal and consistent approach is key. I'll help you codify your SRE practices, ensuring they are well-documented, consistently executed, and auditable. This includes:

  • Error Budget Management: Establishing clear, data-driven error budgets that align with business impact.

  • Incident Response Plans: Developing structured, repeatable incident response frameworks that prioritize communication and action based on business risk, not just technical severity.

  • Observability Frameworks: Building robust observability that connects technical signals directly to business outcomes, allowing for real-time insight into performance and impact.

SRE Accountability: In this new environment, SRE teams are on the front line of regulatory compliance. I'll help you establish clear lines of accountability, ensuring your SRE teams are equipped to provide the evidence and data required by regulators and financial institution clients. This shifts the focus from an internal IT function to a direct contributor to your company's legal and business standing.

SRE Strategic Alignment: The SRE team's work is no longer just a technical concern; it's a key part of your company's legal and business strategy for market entry and continued operation in Europe. I'll help you align SRE efforts with business goals, ensuring that resilience and reliability become a core part of your value proposition and a key driver for successful partnerships.