Embracing platform engineering for efficiency and governance
Since the advent of DevOps, several different ways to implement it have arisen. What we all failed to do is ask if we should implement it in that way. Now with platform engineering, we have an opportunity to get 80% of the value for 20% of the effort while better controlling the security, cost, and compliance to standards, while allowing the agility that DevOps demands.
Foundation for success
For platform engineering to be successful you can not simply implement it as DevOps or SRE and call it a platform engineering team. There are some subtle differences that ensure the success of that platform team with the rest of the business.
The five foundational elements you need to ensure are:
- Treat platform engineering as a product: The goal is to create an Internal Developer Platform that provides consistency and adherence to compliance, not just a set of tools wrangled together.
- At least 20% of the team’s time is focused on projects they deem important: Including technical bebt reduction.
- Barriers to the adoption of tools or technology are reduced as much as possible: Or else there will be no adoption of the standards.
- Create rigid, unmovable deployment processes that allow for compliance and satisfy the needs of the business: However, you must also allow the business to not consume the platform and instead follow a list of compliance needs through the platform engineering team’s guidance.
- Be pragmatic: It takes time. A lot of time.
There are a few things to bear in mind when starting your platform engineering journey. First, you must be clear on what the business and the end user get out of this. If you create tools and technologies that are hard to consume, no one will use them. They need to just work.
You essentially have several types of customers – those who want to release value for the business and those who want to ensure compliance in some way. You cannot be successful if you only address one of these or if you try to address every concern. It is far better that people use the platform than not, as you can gradually bring the platform into compliance with the business needs over time. But, if the platform engineering solutions do not function, no one will use it (even if it is dictated) so if you have to focus on one, choose your consumers.
To help your platform engineering team be successful you will only have 3 months to prove that the team is adding value, to this end it is worth picking one thing to focus on from day one, be it cost, security or consistency.
Scalability and flexibility
One of the biggest values a platform engineering team can provide a business is taking away the pain of creating scalable architectures. By progressing application development towards standardised design patterns and working with teams to implement new patterns as needed, the scalability and resilience is, by design, built in. However, if you only have this as an option, people will bypass the platform and you will lose visibility.
If the team needs a small change, they could cover the cost of a developer or use one of their developers to help extend the platform with guidance. If they want to go full off-piste then you should have a well-documented set of guidelines around what can and cannot be done at each part of the journey including what architectures need to be followed. These should be done consultatively. The idea is not to force the consumer to follow a specific process or design but to outline the constraints they must adhere to.
For example, maybe your platform engineering team use syslog to share/store log files. The requirement is not to use syslog, it is to ensure logs are available for a period of time. By giving the constraints in this way the team may use a different tool that visually represents the logs and provides indexed searching – a technically better solution that you may not have come to if you had not allowed the freedom to explore.
When it comes to where to start, it depends on your audience. What are the most commonly used solutions? Is it Compute? Is it containers? Is it FaaS? Whatever the most used is (or, more accurately, whichever one is desired the most) go with that and try to ensure the business gets something in return such as better visibility of costs.
Compliance by design
As mentioned above, with this level of automation around the application, configuration, infrastructure, and centralised support mechanisms, compliance with various security and regulatory standards becomes far simpler. Anyone who is using the platform becomes compliant, therefore you know their business continuity and disaster recovery plans have been tested and working, they are patching their operating systems, and they are logging correctly. They automatically become compliant with the change process, the incident process, and anything else you require them to adhere to.
If you compare this with the advice above to allow people to bypass the platform you also have visibility of those who are on the journey to compliance and are actively able to enforce it, not to mention the ones that just needed minor changes to adopt the platform. The platform engineering team needs to be aware and actively track the exceptions as this is their client base they are not currently able to satisfy and it should be prioritised to get as many of these services into the platform to help reduce the risk of non-compliance.
Hybrid and multi-cloud
It largely does not matter these days if you are in a single-cloud, multi-cloud or hybrid environment. You now have a mechanism to integrate wherever you need to and to manage the deployment or migrations to new clouds centrally. Ultimately, if you are all on-premise and you have architected your platform correctly, there should be no impact on migrating to a new cloud provider. It will just be a matter of implementing the correct services and allowing the migration to happen as part of a release process.
By consolidating how software is written and deployed in your organisation, you can increase flexibility and decrease the risk of non-compliance. Your success in this area will depend on how you manage the process and ensure that the services provided continue to improve and provide value back to the business while being incredibly easy for the teams to consume.
Reducing the barrier to adoption and allowing people a valid way of bypassing it without the feeling of being put on trial will ultimately increase consistency and visibility around the risks in your business, from here you can make sensible strategic decisions that protect the business and ensure common change processes are followed.