Data ‘observability’ key in keeping pace with software evolution

Aaron Tan

Informa TechTarget

With software moving to cloud architectures and no longer monolithic, organisations will need to establish intelligent “observability” so they can make better decisions.

This is essential to cut through manual processes that the different teams across business, development, and operations need to plough through to access the data insights they need. Better observability will provide the answers they need as soon as possible, in real-time, according to Rafi Katanasho, Dynatrace’s Asia-Pacific chief technology officer and vice-president of solution sales.

He said Dynatrace understood this better than any other player in the market.

The vendor was recently ranked a leader in Gartner’s 2021 Magic Quadrant for APM (application performance monitoring), clocking the highest for its ability to execute and furthest for completeness of vision. The research firm also scored Dynatrace highest in three of four capabilities in the 2021 critical capabilities for APM report, which are essential in supporting business and DevSecOps teams and delivering observability in modern multi-cloud environments.

Gartner defines APM as software that facilitates the observation of application behaviour and its infrastructure dependencies, users, and business KPIs (key performance indicators) throughout the application’s lifecycle.

Its report this year highlights the evolution of APM, which has moved away from its days as conventional monitoring software that supported monolithic architectures and annual software release cycles. This is no longer sufficient in the current IT landscape.

As Katanasho noted, APM now must cater to business environments underpinned by cloud-native infrastructures, cloud-based applications and microservices. Should these fail to work, even for a short time, organisations will have to bear the cost of downtime and lost revenue.

This underscored the importance of intelligent observability, so enterprises can more easily navigate increased complexities of multi-cloud environments, as well as identify and resolve potential issues quickly. The ability to do so also would speed up their ability to innovate.

In fact, the Dynatrace Software Intelligence Platform was designed specifically to deliver the widest observability – pulling applications, infrastructure, user experience, AIOps (artificial intelligence for IT operations), automation, and application security onto a single platform. This enables “a single source of truth” across the entire technology stack.

The platform tracks and stitches dependencies between all observability data, including metrics, logs, traces, and user experience. This topology map forms the foundation for intelligent observability.

Out with the old dashboards, in with new AI

According to Katanasho, traditional monitoring tools provide little answers beyond dashboard visualisations, often requiring manual root cause analysis. This can escalate as IT environments become more complex, with the old monitoring approach not built for scalability and users suffering from “dashboard fatigue”.

Katanasho said: “People don’t want more data; they want more answers. Developers want to spend their time writing new features for the software, and not on troubleshooting.”

He pointed to the use of AI, specifically, causation-based AI, as a way to deliver more precise answers. He said Dynatrace’s AI capabilities automated anomaly root-cause analysis, even within dynamic microservice environments.

The vendor also works with cloud players such as Google and Microsoft on the OpenTelemetry project, which provides an observability framework for cloud-native applications. It offers tools, APIs (application programming interfaces), and SDKs (software development kits), which can be tapped to collect and export telemetry data including metrics, logs, and traces.

The information can then be used to better understand an organisation’s software performance and behaviour, further expanding Dynatrace’s own data source – powered by its AI engine, Davis – and augmenting cloud observability for customers.

Establishing an expansive telemetry is key in multi-cloud environments that organisations operate today to drive digital transformation. Cloud enables greater agility, speed, and scale, and companies need to tap different cloud platforms for various business requirements, such as regulatory compliance, or to use specialised services.

With each cloud running its own telemetry, it is then critical that enterprises have the software tools to help them connect all the services running across the different cloud platforms and establish “full stack observability”.

Katanasho explained that, traditionally, observability had been based on traces, which looked at the ability to trace transactions from start to finish, logs that software and systems generated, as well as metrics, encompassing technical components such as CPU, memory utilisation and response time. It focused primarily on input from data sources.

“With intelligent observability, we’re looking at the output or the end-result of what people consume,” he said. “So, it’s about connecting all the dots and getting to the answers, rather than having you sieve through datasets.”

Here, the AI engine would look at the data and add context to it, going beyond what metrics and logs alone could offer, he added.

The speed at which businesses today need to respond and make decisions, though, meant that adding machine learning tools to the mix would not always be useful.

“[Data observability] needs to solve real-time operational issues and you don’t have one or two weeks [to wait] for the [machine learning] data model to learn before making a decision,” he said.

He added that some business decisions must be made in real-time based on accurate data. In this aspect, machine learning did not always provide a sufficient level of accuracy that was required in some operational environments.

“Businesses instead need an AI algorithm that is purpose-built for use cases, so it can help automate processes in real-time where human intervention will not be needed,” he said. Such “deterministic AI” applications would be able to deal with the need for real-time, accurate, and automated data analysis, he explained.

There are also opportunities for automation to be applied not just to IT processes, but also business operations, Katanasho noted.

For instance, when an online shopper attempts to purchase an item and the transaction fails, Dynatrace’s platform can automatically identify a problem has occurred.

It then looks at the customer’s profile and triggers the appropriate action, such as escalating the issue to a service agent who is prompted to contact the customer with a resolution. A sales voucher can also be automatically sent to the customer with an apology and a note to say the problem has been noted.

“The opportunities are enormous. Enterprise customers can expand our platform [to facilitate automation] for whatever they think is necessary,” he said.

Dynatrace commissioned TechTarget APAC to produce the above content which was not reviewed or influenced prior to publication.