CI/CD series - Confluent: Events provide an 'in-built primitive' for continuous coding

This is a guest post for the Computer Weekly Developer Network in our Continuous Integration (CI) & Continuous Delivery (CD) series.

This contribution is written by Neil Avery, lead technologist and member of office of the CTO (OCTO) at Confluent — the company is known as for its event streaming platform powered by Apache Kafka that helps companies harness high-volume, real-time data streams.

Avery writes…

Continuous deployment sets the goalposts for the style of application architecture.

It means the system should never be turned off, there is no such thing as a big-bang release, instead, new functionality is incrementally developed and released while old functionality is removed when no longer needed.

The application architecture is decoupled and evolvable.

Event-driven architectures provide both of these qualities. To access new functionality, events are routed to new microservices. This routing of events also helps support CI/CD functionality such as A/B testing or Green/Blue deployments (roll forwards, rollback) and the use of feature flags.

Many organisations normally get started with CI/CD by focusing on decoupling and event-driven microservices. The prolific adoption of Kafka, not only makes it a good platform for eventing but also means there is a wealth of industry expertise for building this style of application.

Event storage, replay & schematisation

This style of architecture relies on event-storage, event-replay and event schematisation. In production, Kafka stores all events and becomes the source of truth for understanding system behaviour.

You might say it acts like a black-box recorder for events that can be used to replay incidents or scenarios at a later time. For test scenario purposes, events can be copied from production and made available in the CI environment (once desensitised). It also affords a level of regression testing difficult to achieve with non-event-driven systems.

So events provide an in-built primitive that by their nature make it easier for organisations to get started with CI and CD. The input and outputs of different components are automatically recorded.

The decision to build an event-driven system is significant. There are various pitfalls we commonly see, especially when developers are new to this approach:

Slow builds
A key challenge of the CI build is that test cycles take progressively longer as a project develops. Slow builds affect team velocity and weight against fast release cycles. To overcome this build pipelines should be staged and parallelised.
Limited resources for CI pipeline
As teams grow in scale the resources required to support the CI pipeline will also grow. The ideal solution is to use a cloud-based CI environment that scales according to demand. Recommended tools include: Jenkins-Cloud, AWS-Code/BuildDeploy or CloudBees
Inability to reproduce production incidents
Event-driven systems provide a unique advantage in that production events can be copied to non-production environments for reproduction. It is simple to build tooling to not only reproduce but also inspect incidents and characteristics that occur within specific time intervals.
Manual testing of business functionality
It is common to see manual testing stages to certify business requirements. However, manual testing must be replaced with automation and as such APIs should focus on supporting API based automation tooling. Recommended tooling includes Apigee, JMeter or REST-Assured
Insufficient regression testing
It’s important that regression testing strategies are in place. Regression tests should be signed off by the business as new functionality is introduced.

Lack of knowledge about test tooling for event-driven systems
There are many tools available for testing event-driven systems, we have compiled a comprehensive list at the end of this article.

Generally speaking, the ideal go-live system is based on a ‘straw-man’ architecture. One which contains all of the touchpoints mentioned above and provides an end-to-end system; from dev to release. It becomes very difficult (and costly) to ignore fundamentals and retrospectively change so it’s better to get it right from the outset.

Go-live considerations

From a deployment perspective, the go-live application should have a signature that meets infrastructure requirements, i.e. hybrid-cloud, multi-dc awareness, SaaS tooling (managed Kafka – Confluent Cloud). All SaaS and PaaS infrastructure should be configured, integrated and operational costs understood.

The go-live system is not just the production application, but the entire pipeline that supports the software development lifecycle all the way to continuous deployment; it’s the build pipeline that runs integration, scale, operational testing, and automation. Finally, it supports a runtime with the use of feature flags, auditing and security.

Every application will have a unique set of constraints that can dictate infrastructure.

For event streaming applications delivered using CI/CD, the recommended tools and infrastructure would include:

Language runtime: Java via Knative + GraalVM
Kafka Clients: Java Client and Kafka Streams via Quarkus
Confluent Cloud; A managed Kafka service in the cloud (AWS, GCP, Azure) including Schema Registry and KSQL
Datadog: SaaS monitoring
GitHub/Lab: SaaS source repo
CI environment: SaaS-based build pipeline – one that supports cloud autoscaling (Jenkins cloud, CloudBees, AWS code commit/build/pipeline/deploy)

Event-driven applications also have particular requirements that require special tools for testing and automation.

Confluent’s Avery: Kafka events are the black box recorder route to CI/CD wins.