Yahoo opens Pulsar 'pub-sub' messaging system

Embattled bygone era search firm Yahoo (exclamation point not included) has open sourced Pulsar, a  scalable low latency ‘pub-sub’ messaging system. The technology provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers and cross-datacenter replication.

What is pub-sub messaging?

Pub-sub messaging is a very common design pattern that is increasingly found in distributed systems powering Internet applications. These applications provide real-time services and need publish-latencies of 5 milliseconds (on average) and no more than 15ms at the 99th percentile. At Internet scale, these applications require a messaging system with ordering, strong durability and delivery guarantees. In order to handle the “five 9’s” durability requirements of a production environment, the messages have to be committed on multiple disks or nodes.

Yahoo engineers explain that they could not find any existing open-source messaging solution that could provide the scale, performance and features Yahoo required to provide messaging as a hosted service, supporting a million topics.

“So we set out to build Pulsar as a general messaging solution, that also addresses these specific requirements,” say Joe Francis and Matteo Merli of Yahoo Platforms.

Using Pulsar, one can set up a centrally-managed cluster to provide pub-sub messaging as a service; applications can be onboarded as tenants.

Pub-sub -as-a-Service

Pulsar is also horizontally scalable; the number of topics, messages processed, throughput and storage capacity can be expanded by adding servers to the pool.

“Pulsar has a robust set of APIs to manage the service, namely, account management activities like  provisioning users, allocating capacity, accounting usage, and monitoring the service. Tenants can administer, manage, and monitor their own domains via APIs. Pulsar also provides security via a pluggable authentication scheme, and access control features that let tenants manage access to their data,” said Francis and Merli.

Pulsar includes a client library that encapsulates the messaging protocol; complex functions like service discovery, as well as connection establishment and recovery, are handled internally by the library.