Traditionally IT application releases are done during system downtime and scheduled maintenance periods where application usage is low. This tends to be at weekends or late at night but what if we could release without the need for these downtime windows and what if we could release multiple times during busy traffic periods?
Continuous operations is a term used to describe the enablement of undisrupted application operations without the need for downtime or planned maintenance windows. The ability to deploy new releases without any impact on end users with the current application version being available and used right up to the point that the new version is running, tested and can be made available.
This has many advantages upon previous deployment models using downtime windows to implement changes. The first being that there is no downtime. Depending on the business and application downtime can cost thousands of dollars in wasted revenue per hour by removing the downtime we can remove this allowing customers to access our systems creating sales, decreasing waste of having staff waiting on change completion and just letting people get on with their work.
There is also faster recovery from failed deployments where the previous application version is still available and traffic can be directed back to if regression is required. This is available as an option actionable instantly so no extension to downtime needs to be explained to the customer or users who can’t get back on to their work.
Another pro is the ability to deploy at any time including that of the ‘online day’ when more support staff will be online to assist users with any incidents as a result of the change. It allows issues to be detected and solved quicker than the traditional late night or weekend down time model where the next working day can see the application come to a grinding halt as users discover errors and there is no option of regressing the change. The other side to this is that changes can be smaller in size so issues are easier to detect, diagnose and fix.
To do this we deploy something called Canary releases. This has been championed by large IT driven organisations such as Netflix and Facebook. The idea is to deploy the new release and run it alongside the current application version only exposing a small percentage of users to it. If there are no issues then all users can be directed to the new release and the current (now old) application version switched off. It is a good idea to keep the previous version available and not destroyed for redundancy and disaster recovery reasons in case there are any issues further down the line.
This approach can also be used for A/B testing where a new feature is tested with a small user base and if accepted, as an improvement, can then be rolled out to the entire user base.
So how exactly is this enabled? Well automation is key and a strong continuous delivery pipeline along with associated processes and governance to enable this release type.
Let’s look into the delivery pipeline first. To be able to deploy releases this way cloud environments are highly recommended; be this private or public cloud. It would be impossible with traditional infrastructure but it would be complicated and have drawbacks and bottlenecks. Using cloud environments enables new, standard environments to be created for each release and older ones removed and turned off. Standard environments that are the same each time they are created (infrastructure as code) is key to reliable, repetitive releases.
The next is that all the deployment activities; installation, orchestration and configuration, should be automated using agreed, standard tooling. This removes any human error and deployment activities can be tested before hand as they are consistent across environments (test, pre-production and production). Acceptance testing to a degree should also be automated allowing for a wide range of tests to be passed before any users access the new application version. These should include synthetic transactions mimicking user actions to ensure user performance and functionality aren’t degraded. These automated procedures ensure consistent results and decrease deployment failures.
The other side of the equation is the process and governance that enables continuous operations. This can be done using a modified ITIL model with changes to create ‘standard’ release practices that do not require to be ran through a full approval process each release. Standard release practices should be created and modified when part of the automated release process is altered, once this is baselined it should be able to be ran with similar release packages without the need for an approval process. This gives the speed and freedom required to release more often whilst keeping the risk manageable with standard practices.
Additionally to this response speed to any issues is important and the operations team should be fully empowered to make decisions to regress to the current baseline if required. There should also be clear advice on when the full user base can be enabled to access the new deployed version. This will reduce mean time to resolution of issues and provide increased stability to the application. Better a release be rejected from live to be fixed (quickly using DevOps practices) than having errors in production environments or ‘failing fast’.
This of course is a very high level discussion to the multiple ‘moving parts’ involved in any application environment with multiple teams and sometimes vendors involved. Hopefully this explains a little about continuous operations and displays some of the key considerations.
Extending Continuous Operations is the ability to detect issues and correct them before they have an impact on users (lights out end to end operation) and the enablement of automated request fulfillment through self-service but this will be covered separately. Only with these concepts will operations have the agility and responsiveness to be considered truly continuous.