Learn how to accurately show the value DevOps is delivering to your organisation
When it comes to cloud and DevOps there are many factors to bear in mind when it comes to measuring the return on investment. Unfortunately, some organisations only consider the cost side of the ROI and forget about the other areas that add value. That’s why it’s important to have good measurements in place before starting your transformation journey and to have a clear narrative around what changes you are implementing, what value they will return to your organisation, and why it’s important to articulate it correctly.
What value is there in a cloud and DevOps transformation?
Let’s talk about the elephant in the room: saving money does not equate to value. You can save all your money by doing nothing, but what value does that provide? That’s right, None.
To help articulate this, we will discuss cloud and DevOps as separate entities as doing so enables us to articulate the value of one in comparison to the other.
The main (and potentially only) value that the cloud provides is flexibility. I used to say it was flexibility and agility, but if you think about it, everything comes down to how flexible it is. You can spin up workloads ad-hoc and turn them off again, you can provision new services at the click of a button, and you can launch a product on the other side of the planet from your bedroom. So, the true value of the cloud is flexibility.
With that in mind, other elements become variables in the equation: cost, time, and efficiency. They may be fixed, but your cloud usage will always be flexible.
If you have a cloud platform that is not providing you the flexibility to react quickly and deploy services where you need them, but you are not getting value from it. Bear that in mind next time you go to release a new application: how easy was it? If it was hard, you don’t really have a cloud. You have a dynamic infrastructure provider for ad-hoc resources.
Unfortunately, DevOps is a bit more nuanced than the cloud. DevOps does not provide tools or technology – these are necessary to achieve the value of DevOps. But the value of DevOps itself is that it gives you a way of working to ensure you take a scientific approach to releases and that you gradually improve upon the status quo.
By adopting common DevOps approaches (such as Agile, CI/CD, and SRE) you increase your ability to release software consistently and with processes that allow us to compare A to B in a scientific way to ensure we are making the solutions better.
Creating a compelling narrative
Now you’ve learned the real value of both cloud and DevOps, you can categorise other elements of the value that your business perceives. This could differ between teams in an organisation, and you may need to change the priorities to ensure the overall business sees what it perceives as ‘value’ as soon as possible.
Cloud value articulation
With the core value of the cloud being flexibility, it is important to tailor the below measurements to meet what your organisation perceives as valuable. To help, I’ll cover some core metrics that are commonly used.
Cost is an obvious key metric – both the total cost of cloud usage and individual project/program cost. You must keep tight control of costs as you use the cloud and break costs down into three levels: individual applications, environments, and programs. This will generate the data you need to justify the work but bear in mind costs may increase, not because it is more expensive but because you can do more with the same resources.
For example, let’s say you implement Ten10’s PTaaS solution [JS1] and you can simply deploy environments. You may find that you go from running three environments on-premise to over 50 in the cloud. Overall, you are spending more money, but you can also test more features in parallel, reduce your mean time to recovery (MTTR), increase stability in your platform, and recreate issues between environments.
My advice here would be to present the costs back to the business in two forms in addition to the overall cost:
- The cost per environment: How much a like-for-like solution costs before and after the transformation. This shows that if the overall cost of cloud is higher, the cost of delivering each environment may be reduced.
- Cost per capita: How much it costs for a DevOps team to support that environment. For example, if each platform or solution had its team supporting its services and three environments, but now you have one team supporting multiple environments and multiple platforms or solutions, you have reduced the per capita cost of supporting those environments.
Disaster recovery/business continuity/MTTR
Another metric I find useful is how long it would take for you to recover from X. For example, if you have a solution in a data centre and the whole data centre fails, how long did that take? Perhaps you switch to your disaster recovery data centre and it is quicker, but how long would it take if you needed to deploy a new one?
The great thing about the cloud is the programmatic interface. Being able to treat infrastructure as code means that you can deploy your solution in multiple availability zones and if there is an issue, you switch from one region to another. If you have set up your DevOps processes correctly there should not be any real difference between carrying out a release or enacting a recovery of some kind.
Consider reporting and measuring the following:
- How long it takes to rebuild the solution from scratch (excluding data): This would be a typical major incident if you had to carry out a rollback or replace a server but your data was okay.
- How long it would take to migrate from Cloud A to Cloud B if required: This would be your Disaster recovery/business continuity, depending on the reason.
- Recovery time from an infrastructure-related issue: If done well, it should be the same as the first bullet point.
You can take these numbers and apply a sum based on lost sales or income because the service was not available. When compared to the original solutions, these numbers provide two details: how much revenue loss has been mitigated by your cloud journey and how much money has been saved in the future if you need to change provider again.
DevOps value articulation
We discussed earlier that the key value of DevOps is that it gives you a set of approaches that allow you to work differently. By having a continuous improvement mindset, you can increase automation and in turn improve consistency, efficiency, and stability.
Support plays an important part in a move to DevOps ways of working. As an organisation, you have moved from having people looking after different parts of the infrastructure and application to simply consuming the infrastructure and worrying about the application. But if we exclude those relatively straightforward numbers of people, there are additional cost savings around support that aren’t always considered. They come as a combination of a journey to the cloud and correctly implementing DevOps principles.
When considering problem diagnosis, the approach with the cloud (with the correct DevOps principles in place) is very different. You no longer spend time diagnosing issues. Why would you do that when you have a service issue? Instead, it is better to fix the problem and then come back to the operational metrics and logs to diagnose the issue or investigate the problem on the original environment after you have moved the workload onto new instances and services.
The same work is being done, just in a different order – an order that would not have been possible without the flexibility provided by the cloud to simply spin up another environment and transfer the traffic to it. This has a drastic impact on your support costs, time, and serviceability as you can be focused on restoring service rather than investigating the issue to get the right solution.
When articulating the ROI of your cloud, it is worth calling out the new capabilities that have been created as a result of having the increased flexibility. To help measure this, look at your MTTR as discussed above. You may also want to measure the number of incidents and severity. If you are following good principles, you should apply a continuous improvement approach which increases the reliability of the platform over time, particularly if you are using a site reliability engineering approach.
One of the harder aspects to quantify, but certainly the most valuable element of DevOps, is the change in culture. A lot of organisations will talk about it, some even push their culture, but very few measure it.
There are a few ways you can measure the impact of cultural change. You could utilise surveys to measure collaboration or people’s continuous improvement mindset. This is great for long-term trend analysis but there are easier indicators that will give you a feel for the culture change:
- Knowledge sharing sessions: One element of DevOps is collaboration. You can’t have collaboration without sharing, so tracking the number of knowledge-sharing sessions attended and being created can indicate a culture shift.
- Ease of access to the right people: When it comes to DevOps, there is nothing more important than resolving the issue and making the problem go away. In mature organisations, engineers throw themselves into the problem, no questions asked. In less mature organisations, the response is typically ‘you better check with X first’. The only way to realistically measure this is with an intelligently designed maturity assessment.
- Technical debt prioritisation: In effective DevOps cultures technical debt and toil are highly prioritised because there is an understanding that technical debt leads to poor solutions and services. So by tracking the technical debt of a team you can understand if it is being prioritised correctly, if it isn’t you can dig into whether it is the management culture or the team culture later but understanding that low technical debt is better is useful, but very low technical debt will also indicate a problem.
Tracking these elements means you can demonstrate and articulate where you are on a transformation journey in terms of culture. With an improved culture, we know we often have lower MTTR, increased productivity, and increased satisfaction.
Release frequency and release cycle
One of the advantages of DevOps is improved release frequency. A lot of organisations measure this and, when things are going their way, shout their results from the rooftops. But that only tells you how hard you are working. It does not tell you how effective you have been.
Take this example: You currently have a release once a quarter. That’s four a year. Through tooling and technology improvements this increases to one a month. You have increased your release frequency from four to twelve: a threefold increase. The next factor is release cycle time. It used to be three months. To achieve twelve releases, you have to either decrease the release cycle by two-thirds or increase productivity. or a combination of both.
While release frequency tells you how many you are doing, the release cycle time tells you how long each one takes. Therefore, if you go from four a year to twelve but your cycle time hasn’t changed, you may just be working three times as hard.
Release frequency is simple to measure – it’s how many releases made it to production over a period of time. The release cycle is a bit more challenging because you first need to decide what constitutes a release. It depends on your approach to agile, but if you are using scrum, then from sprint start time to the code being in production is the release cycle time. For scrum, a good time is two weeks. If you are using Kanban and pushing directly to the main branch, it would be from task started to task deployed to production. This will depend on the complexity of the task, but I would suggest one day or less as an average is good.