Africa Studio - stock.adobe.com

What should platform engineering look like?

Platform engineering can help IT infrastructure and operations teams become faster, more responsive and empathetic with internal customers

Platform engineering is based on the principles of product management and the product model applied to digital and IT systems. Fast-moving digital teams show resistance to strict process frameworks such as the Information Technology Infrastructure Library (ITIL) and IT service management (ITSM), and autonomous digital or IT product teams are becoming self-sufficient, reducing the need for traditional infrastructure engineers.

Platform engineering, grounded in product management principles, offers an approach to modernising IT operations. By injecting product thinking into platform teams, Forrester believes technology organisations can position themselves for the future.

What is platform engineering?

Forrester has compiled a capability model for platform engineering that includes frequently covered technical aspects and less frequently covered management capabilities. It is an inventory of things you should think deeply about and ensure you have covered via your organisational resources, which might include not only dedicated organisations, but also cross-functional processes, enablement teams, or other mechanisms.

Your capabilities are how your customers experience the platform. They are your front door, so to speak. Your customers will discover your platform, onboard onto it, provision it, interact with its application programming interfaces (APIs), leverage patterns for security and performance, and call for help via these capabilities. And no, there is no such thing as an entirely automated self-service platform.

Users and developers need to be able to discover the platform and its services. Managing your platform like a product means you understand the onboarding journey of users and invite them to be part of the process of defining – and even contributing to – developer platform capabilities.

They will expect easy, frictionless authorisation and access, with few, if any, human-in-the-loop workflow-based approvals. Once provisioned and actively developing, they will need information about the ongoing status of the services they are consuming.

Usually, larger organisations will have a service catalogue or portal capability for IT services. If this does not exist, you must fund and create it. Developer-focused portals – for example, Spotify Backstage, Harness Internal Developer Portal, Atlassian Compass – are gaining popularity. Toyota of North America, for instance, includes consumable blueprints, a discoverable software catalogue, education and training resources, and operational reporting for FinOps and other metrics in its developer portal.

Access to platform services and resources is typically a two-stage process, with initial provisioning (setting up accounts) followed by day-to-day demand (provisioning virtual machines, clusters, and so on). While setting up the account may require some human approvals, day-to-day demand requires API access.

A platform that cannot provision, configure and manage base resources via APIs is not a true platform. Typically, platforms support APIs to instantiate and configure required resources, such as processing nodes, data stores, queues, pipelines and observability probes. There are significant API design questions. Many organisations generally have API engineering capabilities, but may not have explored the nuances of supporting self-service provisioning. 

Users of the platform also require ready access to documentation on how to use it. How will these be created and maintained? Typically, a wiki is used for core system quick starts and how-to guides. Forrester recommends documenting patterns as code and managing them via source control. It is also advisable to define the processes, roles and responsibilities for those in charge of these resources. Saying that it is everyone’s responsibility is tempting, but that approach does not work at scale or in the long run.

Support is another key capability. Platforms are typically highly leveraged. Users building tenant applications may not understand the system. The system may not behave as expected. For these and other reasons, you will likely need some level of on-call support. Human contact is required, even in the age of ChatGPT.

Most organisations have ticketed support management, such as with BMC Software and ServiceNow, for example. This may be used to support the base platforms, and tenant applications may leverage it. However, as Forrester notes, fewer have a robust major incident/critical event management capability, which is essential. Such capabilities are based on products like PagerDuty or Everbridge.

Operational capabilities 

The focus for many platform engineering architectures and frameworks is the operational capabilities, especially those that are more technical. While there are many kinds of infrastructure platform components, the fundamental DevOps chain capabilities appear in most platform engineering discussions.

Forrester recommends that deployments and operational architectures are controlled for governance and policy. Increasingly, this is done as code, such as through Open Policy Agent and similar approaches. Required design patterns, configurations and hardening standards should all be checked. Are software-bill-of-materials (SBOM) checks increasingly mandatory? What are the consequences if they fail? If there is a change management process, how is risk calculated? Are chaos tests recommended or required by policy?

The platform’s direct (administrative/developer) users must be identified and authorised, and the products and applications they are building will require identity and access services, which might be quite different from the services controlling administrator access to the platform. Which are you supporting?

Forrester recommends that IT decision-makers check whether common directory services are available to administrators, if there is privileged access management and, if multifactor authentication (MFA) is being used, whether single sign-on, and/or directory services are available for users of the tenants. The pipeline needs to offer security testing such as software composition analysis, SBOM generation and static application security testing.

Considering that applications, or workloads, are installed on resources once provisioned, it is useful to have a full set of development pipeline resources within infrastructure platforms. These should include access to source control and package management, perhaps via proxying cloud services such as GitHub or GitLab. 

In addition, the IT infrastructure on which the workload is deployed will require provisioning of base IT resources, which will need to be configured and managed. This is generally achieved through infrastructure automation. IT decision-makers should check whether run-time provisioning is based on Terraform or is hyperscaler-specific. Does the platform provide a proxy layer to a cloud provider?

Once initially provisioned, configuration may be a separate concern – for example, with Red Hat, Chef, or Perforce Software [Puppet] – which can also control for drift. There is a wide variation, which depends on technical feasibility.

Deployment support

Platform engineering can include AIOps, so IT decision-makers should also look at how the platform itself is monitored and observed, and how operational insights are generated.

What is the relationship between AIOps and action (for example, support)? Forrester recommends that IT decision-makers assess services like monitoring, logging and tracing that are available to tenant applications. How is user experience understood? For instance, an application performance management or AIOps tool might be available as part of the platform for real-time insights that span platforms and encompass the whole IT estate. These insights may then be published on a developer portal.

Finally, Forrester notes the significance of platform reliability. IT decision-makers should assess how the platform itself is managed for resilience, availability and learning. For example, site reliability engineers might have a specific function in defining the platform approach, leading major incident response and retrospectives, and reviewing operations. A retrospective could lead to identifying a risk for which a chaos engineering approach might be used as a control.

Overall, Forrester regards platform engineering as a viable approach to tackle traditional team silos in areas such as compute, storage, networking and middleware, where teams struggle to meet market demands for innovation and employees prefer a collaborative and responsive work environment. As such, product-centric thinking in IT platform management can be used to enhance service delivery.


This article is based on an excerpt of The Forrester platform engineering capability model. The author, Charles Betz, is vice-president principal analyst and leads Forrester’s enterprise architecture team.

Read more about platform engineering

  • Platform engineering: As companies scramble to build slick, scalable experiences for customers, optimising their own internal platforms is typically overlooked.
  • Platform engineering: Why the platform engineering efforts that succeed tend to begin not with technology, but with human relationships and knowledge.

Read more on IT operations management and IT support