Progressive delivery and the AWS Well-Architected Framework

The AWS Well-Architected Framework is a set of guiding principles designed to help dev teams build and operate robust, secure, efficient, and cost-optimized systems on AWS, or any cloud provider.

But why should you care about this framework? It's more than just a set of best practices; it's a strategic guide for building cloud solutions that are resilient, performant, and cost-effective. Adhering to the Well-Architected Framework helps organizations mitigate risks, improve operational efficiency, and ultimately deliver a better experience to their users.

At MultiTool, we believe progressive delivery is an underrated tool for helping achieve the best practices outlined as part of the framework, and ultimately a better experience for your end users. Progressive delivery, which includes strategies like canary deployments, ring deployments, and blue/green deployments, allows you to gradually expose new software versions to a subset of users. This approach minimizes risk and enables real-time feedback. Let's explore how progressive delivery aligns with certain pillars of the AWS Well-Architected Framework.

Operational excellence: automate, monitor, and improve

The Operational excellence pillar focuses on running and monitoring systems to keep users happy while continuously improving processes and procedures. Progressive delivery is inherently aligned with this pillar because it emphasizes automation, real-time monitoring, and continuous improvement through iterative, low-risk deployments.

Automating the rollout of new features through canary or blue/green deployments reduces manual errors and ensures consistent deployments. Consider a CI/CD pipeline integrated with a progressive delivery tool: upon successful unit and integration tests, a new container image is automatically deployed to a canary environment. While you might have Prometheus metrics or CloudWatch alarms configured for elevated error rates (e.g., HTTP 5xx responses or increased p99 latency) on the canary instance, these threshold-based metrics are only half the battle. Once your alerts have fired, it’s already too late. In order to create a true feedback loop, completely automated rollbacks are crucial.

This loop, which must be driven by empirical data from production traffic, can also foster a culture of continuous improvement, as insights from each deployment inform future development cycles and refine automated processes.

Reliability: building resilient and recoverable workloads

The Reliability pillar is about ensuring a workload performs its intended function correctly and consistently, and additionally has the ability to recover from disruptions on its own. Progressive delivery directly contributes to reliability by:

Minimizing downtime and blast radius: Instead of a risky "big-bang" deployment, progressive delivery allows for seamless transitions between old and new versions. For example, in a canary deployment, only a small percentage of traffic (e.g., 5%) is routed to the new version at first. If the new version encounters issues, traffic can be quickly shifted back to the stable, older version with minimal user disruption. This strategy significantly reduces the blast radius of potential failures, ensuring that only a small subset of users or requests are affected by a problematic deployment.
Faster problem detection: By having an automated system monitor key metrics (e.g. p99 response time, CPU usage, or memory usage) and error rates (e.g. number of 4xx or 5xx errors) during a progressive rollout, the system can more quickly identify performance regressions or critical bugs than a human can, and can rollback the changes before they impact a large segment of your users.
Reduced risk of widespread outages: The ability to rollback a problematic deployment to a known good state within minutes instead of hours drastically reduces the likelihood of prolonged outages and ensures business continuity. This is particularly valuable for critical microservices like authentication services where a bad deployment could cascade failures across dependent services.

Performance efficiency: optimizing resource utilization

The Performance efficiency pillar focuses on using computing resources efficiently to meet system requirements and maintain that efficiency as demand changes and technologies evolve. Progressive delivery contributes to performance efficiency in several ways:

Real-time performance validation: During a canary rollout, your deployment tool is observing the performance of the new version under real-world load conditions. This allows the system to automatically identify performance bottlenecks or regressions before they impact all users and prompt you to optimize accordingly. For instance, if a new endpoint in your application significantly increases latency or CPU utilization on the container, a canary deployment would expose this immediately, preventing a widespread performance degradation.
Optimized resource scaling: By understanding the performance characteristics of new versions in a live environment, you can make more informed decisions about resource allocation and auto-scaling configurations. If the canary release exhibits higher resource consumption per transaction, you can adjust your AWS Auto Scaling Groups target tracking scaling policies or step scaling policies to provision additional instances proactively, ensuring you have enough resources without over-provisioning and incurring unnecessary costs.

Cost optimization: maximizing business value at the lowest price point

The Cost optimization pillar is about avoiding unneeded costs and maximizing the business value of your cloud investment. Progressive delivery, while seemingly adding complexity, can actually lead to significant cost savings in the long run.

Consider the contrast between blue/green deployments and canary deployments in terms of infrastructure. While blue/green deployments offer excellent isolation, they often require maintaining two entirely separate, fully scaled production environments (e.g., two distinct Auto Scaling Groups and associated resources), effectively doubling your EC2/Fargate, EBS, ALB, and other resource costs during the deployment phase. Canary deployments, on the other hand, only require a small percentage of your traffic to be diverted to the new version running on a small compute unit, leading to a much lower infrastructure overhead during testing (e.g., spinning up only 10-20% of your typical instance count for the canary group).

Catching issues early with progressive delivery also reduces the likelihood of costly manual rollbacks, developer time building emergency fixes, breaking SLAs, and reputational damage that can lead to lost revenue in the long term.

The MultiTool advantage: elevating your Well-Architected journey

Automated progressive delivery, with its focus on iterative, low-risk deployments, continuous feedback, and faster-than-human responses, seamlessly integrates with most of the six pillars. By embracing progressive delivery strategies, organizations can build more operationally excellent, secure, reliable, performant, cost-optimized, and sustainable systems.

At MultiTool, we simplify the implementation of advanced deployment strategies, providing the automation, observability, and control you need to confidently roll out new features, minimize risk, and continuously optimize your applications.

Try the MultiTool beta for free today!