Auto-tech series - Lacework: More automation + more code = more vulnerable

This is a guest post for the Computer Weekly Developer Network written by Ryan Sheldrake in his capacity as Field CTO at Lacework – a company known for its cloud security automation technology.

Sheldrake believes that with DevOps automation and Infrastructure-as-Code in our world of IT stacks, adding security into the automation mix is no longer desirable, it is essential.

Marc Andreessen already told us that software was eating the world back in 2011. Since then, we have seen hardware on the menu, starting with software-defined networking and storage and with many organisations now adopting cloud native computing. Through this hardware abstraction, we opened up the door to Infrastructure-as-Code (IaC) and DevOps automation.

Sheldrake writes as follows to explain the state of the security automation nation…

With cloud-native development, (at least in theory) traditional Ops have been largely automated, Ops considerations have been shifted left within the development cycle and platform engineers can deliver a world of developer-self-service .

But as IaC automation has added new layers of code, this has dramatically increased the attack surface and opened up new security vulnerabilities. With security teams swamped and skills in short supply, the inevitable result is that security needs to be woven more tightly into DevOps automation.

Checking code, running the code

DevOps (and its growing list of cousins: DevSecOps, DevSecBizOps, GitOps, NetOps, FinOps etc.) are all about shifting challenges and problem resolution left (earlier) in the development cycle. Here, the most obvious role for automation is around configuration: ensuring devs get Ops correct from the start of the cycle. The platform engineer uses infrastructure code (e.g. Terraform) to mandate configuration settings, dependencies and so-called ‘golden images’. The idea is to limit developer options in order to ensure the underlying infrastructure is: consistent, compliant and reliable.

But when we add ‘secure’ into this list of required infrastructure traits to be automated, things become more complicated. Either your security team needs to understand the intricacies of IaC and modern platform engineering, or you need a platform engineer who doubles as a security expert. This is very rare, expensive and hard-to-hold talent. Convincing this talent to focus on endless, laborious checking of IaC might be a hard job. Especially when this can be automated through third-party cloud-native services.

It now seems unthinkable that an organisation of any size would not use code scanning in some form to check for bugs, CVE’s within modern application code. With developers spinning-up instances on demand and frequently using open source software, libraries and long dependency chains, manually auditing and securing the application software in most cloud-native environments, would be a very un-economic (and frustrating) use of skilled time.

The same is now true for infrastructure code, especially running (micro) services within a multi-cloud environment. The possible combinations of cloud provider and third-party platforms, VMs, containers and Kube distributions are bewildering in themselves. You then need to add-in the preferences different development teams may have for: identity, key and certification management; persistent storage or DBaaS; queuing, messaging and a multitude of networking options.

Securing all this is not just a case of knowing what each of these components is and does. It requires knowledge of the idiosyncrasies in how each of these infrastructure components interoperate, knowing the gotchas within all the different permutations and combinations.

To make things even more complicated, we are also dealing with a moving target. Infrastructure combinations are being added to constantly, by any DevOps engineer with an interesting idea and 30 seconds to write a new line of infrastructure code. And then the cloud-native platforms, infrastructure software and services themselves are regularly updated, new vulnerabilities are emerging and attack vectors are constantly evolving.

In short, expecting any team to secure infrastructure code without automated scanning and tooling is risk-laden, uneconomic and probably unrealistic.

Shifting left, right

The concept of successfully shifting anything left is about more than just getting the initial config right. It is about feeding evolving operations and security considerations back to developers. It is also about flagging concerns early and often as developers are working in their existing flow, to avoid later roadblocks and unnecessary development and test cycles.

DevSecOps, in the form of process automation between security and developers is now vital to avoid the security gatekeeper scenario. This is where development teams are hit with a long compound report of security concerns just as they move code into production, forcing a new and often unnecessary development cycle.

Runtime analysis

Security is no longer just about removing vulnerabilities – and this is where another vital aspect of DevSecOps comes into play.

Security now has to monitor the behaviour of systems at runtime to spot evolving threats. Not just looking for known threat vectors, but able to spot new and undocumented attacks. In a cloud-native environment this means modelling the normal behaviour of everything: users, developers and platform engineers, containers, VMs, software and processes.

Only with a clear view of the cloud infrastructure and applications at runtime, does it become possible to define the ‘normal’ behaviour of cloud native systems. And only then, does it become possible to spot suspicious activity and processes without knowing exactly what you are looking for.

With a granular view of runtime activity, combining DevOps and security expertise can spot anomalies, the things that should never happen. e.g authentication to external LDAP server, log4j pre CVE detection. Anyone, or anything escalating privileges might be the obvious example, but this could also be a new admin that never logged into a particular type of machine, or a process that has never previously made an external network connection or a DNS request. Getting more sophisticated, it is possible to automatically set logical triggers based on magnitude for error codes, data volumes and similar metrics.

Inevitably, as with any current article about automation, this brings us to AI and machine learning – and these technologies can play a vital role in modelling this real-time system behaviour within the cloud and application space, increasing the accuracy of alerts. However things are implemented, these types of logical tripwires on runtime cloud operations are becoming a vital part of security. Tightening the collaboration, knowledge sharing and process automation between DevOps and Security is an essential starting point.

Avoid ‘organisational friction’

Finally, it is worth highlighting again the importance of effective communication and collaboration alongside DevSecOps automation. There is a lazy misconception that shifting anything left means dumping new problems onto a superpowered DevOps team. Where this includes new responsibilities for resolving real-time security issues, the potential for what is euphemistically called ‘organisational friction’ is high.

Just as above, where code and infrastructure-as-code issues need to be flagged early and presented to developers in their workflow, real-time security issues need to be handled the right way. To avoid frustrated, burned-out developers and DevOps teams, DevSecOps concerns need to be integrated into developers daily stand-ups, sprints throughout their systems (like Jira and Git) and not dumped as an additional burden alongside their daily workflow.

Sheldrake: No lead required – it’s time to shift left on the long walk to better security-aware code automation.