How developers can operationalise bots – Part 2

This paired couplet of guest posts for the Computer Weekly Developer Network is written by PK’s Ram Sathia in his position as VP of intelligent automation and Ajmeer Ali Liyakkathali in his position as PK’s associate director of intelligent automation. 

PK describes itself as the ‘experience engineering’ company — its software works in areas related to marketing touchpoints that enable brand connections like lifestyle apps, 5G and IoT transformations, personalised e-commerce and connected health care platforms.

Sathia & Liyakkathali write as follows…

Creating the right environment for fast deployment is just the first step to achieving rapid ROI with bots. To sustain the value of Robotic Process Automation (RPA) and increase its ability to automate manual work, developers must also consider tracing and recovery management.

Implement tracing & telemetry

A robot is a digital worker; therefore, many of the rules that apply to human workers must apply to digital workers as well. 

Most organisations track and log the actions of manual operations teams for compliance auditing, and the same must be done for bots.

In fact, a higher level of tracing is required for digital workers because they operate in an environment in which business employees can’t monitor their performance. In addition, if business rules and scenarios aren’t defined correctly, bots run the risk of violating them. 

To improve bot tracing in RPA initiatives, organisations can create transaction trails. 

Transaction trails, or breadcrumbs, are a sequential record of actions a bot makes to complete a record/request. This information is critical to reporting for both business and technology teams and to restart failed or partial transactions.

Track exceptions

Exceptions are the events/rules that disrupt the normal flow of program execution or rules that are unknown or unexpected. Business exceptions help identify process-related waste and opportunities for improvement, whereas technical exceptions help IT improve the stability of bots.

Establish logs. Logs are events that can be informational, debugs or fatal, which helps diagnose errors. They are also the main drivers for monitoring alerts.

Trace assets

Monitoring CPU, memory, Windows events, services and OS updates on the machines and servers of RPA infrastructure helps developers stay a step ahead of errors and optimize bot performance.

Enable security tracing. Failure to monitor logs and documents bots leave in the system/server poses a serious and often overlooked threat to security. Remember that bots must follow many of the same rules as human workers, so be sure to track compliance, such as through an auditing log.

Measure bot success with telemetry. Bots help enterprises unlock ROI based on factors like transaction status, transaction volume, business exception, automation rate, actual hours automated vs. projected, actual volume processed vs. projected volume, and bot cycle time compared to human workers. Tracking these metrics helps organizations measure the success of automation and perform analyses to correct failures, exceptions and volume drops that affect ROI.

Use recovery management best practices. Bots’ ability to recover after an error ensures maximum uptime and the ability to complete a greater number of manual tasks. Developers should focus on two critical qualities to enable recovery when working on bots: high availability and ability to self-heal.

Availability & healing

Building a high availability infrastructure, such as by setting up a two-node load balancer in an active/passive configuration or setting up disaster recovery with an additional data center and reserve pool of machines, helps RPA programs support multiple bots and offer better performance and failure resistance. When one bot fails, the other bots can pick up the load and provide continuity of work.

If not developed to be self-healing, bots can be unreliable. Leverage AI and an automated script to correlate bot failures with matching patterns. Design bots with a recovery framework that logs process milestones so that bots establish failure checkpoints and can better recover.

These best practices are all fundamental principles of RPA reliability engineering. After all, developing bots is only half the battle; ensuring that they perform effectively in real-world work environments is just as essential to the success of RPA.

PKs Ajmeer Ali Liyakkathali (left) and Ram Sathia (right)

Data Center
Data Management