Overview
Our client is a leading global insurance company with complex enterprise systems supporting their operations across multiple regions. They engaged DX1 to strengthen their vulnerability management program and implement zero downtime deployment capabilities to enhance both their security posture and operational resilience.
Phase 1 focused on three key applications, that would create a blueprint for further applications in the future. The project addressed critical needs around continuous service availability during security and maintenance updates, enhancing our client's operational resilience while maintaining robust security posture.
Challenges
- Service Disruption During Updates: Prior deployment processes required taking both application nodes offline simultaneously, creating unacceptable downtime windows that impacted business operations and customer experience.
- Inadequate Testing Frameworks: Existing Post Implementation Verification (PIV) processes lacked automation, resulting in inconsistent deployment validation and potentially introducing undetected issues into production environments.
- Vulnerability Management Complexity: Security patch deployments followed traditional change management processes that were time-consuming and not aligned with modern agile methodologies.
- Insufficient Automated Testing: Limited daily automated functional testing increased the risk of undetected issues when deploying updates.
- Unclear Ownership: Vulnerability and patch management responsibilities were not clearly aligned to the Agile and Tribe models, creating ambiguity and inefficiencies.
Solutions
The Zero-Downtime Deployment (ZDD) program implemented a comprehensive solution to meet vulnerability management SLAs of 5 days for external-facing systems and 30 days for internal systems, significantly improving upon the previous 56-day baseline for vulnerability remediation. The solution utilised a sophisticated pipeline architecture featuring load balancer integration, node-based sequential deployments, and automated technical and functional Post-Implementation Verification (PIV) frameworks.
The implementation followed a phased approach, beginning with three key applications over three months before expanding to 38 tier 0/1 applications over nine months. The solution incorporated pipelines for CI/CD integration, IaC for configuration management, and an extensive automated testing framework including daily functional tests and end-to-end verification in both production and non-production environments.
Results
- Eliminated Planned Downtime: Successfully implemented rolling deployments that maintain continuous service availability during updates.
- Enhanced Security Posture: Accelerated vulnerability remediation timeframes by streamlining the deployment process for security patches.
- Improved Testing Reliability: Automated PIV frameworks ensure consistent validation of all deployments, reducing the risk of undetected issues.
- Streamlined Processes: Clear ownership models and updated procedures aligned with Agile and Tribe structures improved efficiency and accountability.
- Enhanced Monitoring: Integrated health checks provide immediate feedback on deployment success, enabling faster remediation of any issues.
- Operational Resilience: Established a framework that can be expanded to additional applications in future phases.