YugaByte: 7 core IoT developer skills

YugaByte is a newly established company that sets out to deliver what it describes as turnkey distributed consistent and highly available database delivering data access with cache level performance.

The core YugaByte database offering (logically called YugaByte DB) aims to reduce the learning element associated with big brand well known databases to combine what essentially becomes a combo of the best of the SQL and NoSQL paradigms, but in one unified platform.

In essence, YugaByte says it is purpose built for agility inside cloud-native infrastructures — the firm’s founders have suggested that this product represents the new breed of ‘distributed’ systems

Recently emerged from stealth mode [as in corporate launch, not as in video game], Yugabyte is co-founded by ex-Facebook engineers Kannan Muthukkaruppan, Karthik Ranganathan and Mikhail Bautin.

7 core IoT developer skills

Providing the Computer Weekly Developer Network with some insight into its views on software application development for the Internet of Things (a key potential use case for YugaByte, claims the company), the co-founders have suggested 7 core IoT developer skills that programmers need to embrace if they choose to work in the IoT space.

Muthukkaruppan, Ranganathan and Bautin write from this point onwards…

1 – Data Collection:

Typically, data agents are deployed on various devices which can preprocess the raw data if necessary. These agents then send the data to a well-known endpoint (which is a load-balancer) using a persistent queue. These persistent queues, with the store and forward functionality, are often implemented using an “emitter” component of a messaging bus solution such as Apache Kafka.

2- Data Ingestion:  

The data received by the load-balancer is sent to the “receiver” component of the messaging bus, again Apache Kafka being a popular choice. Very often, these massive streams of data coming from edge are written to a database for persistence, and sent to real-time data processing pipelines.

3 – Data Processing & Analytics:

The data processing and analytics stage derives useful information from the raw data stream. The data processing may range from simple aggregations to machine-learning. Examples of applications these data processors may power include recommendation systems, user-personalization, fraud-alert, etc. Common choices for tools here include Apache Spark and TensorFlow.

4 – Data Storage:

A transanalytic (hybrid transaction/analytical) database is needed to store data in serveable form as well as for deriving business intelligence from the collected data. The database needs to be efficient at storing large amounts of data over many servers, and highly elastic to meet the growing demands of the data sets. The database must be capable of powering user-facing low-latency requests, web-applications/dashboards, etc. while simultaneously being well-integrated with real-time analytics tools (such as Apache Spark). Databases such as YugaByte DB and Apache Cassandra are good choices for this tier.

5 – Data Visualisation:

Mobile and web applications need to be built to power the end-user applications such as a performance indicator dashboard or a customized music playlist for a logged in user. Frameworks such as node.js or Spring Boot along with websockets, jQuery and bootstrap.js are some popular options here.

6 – Data Lifecycle Management:

Some use cases need to retain historical data forever and hence,  need to automatically tier older data to cheaper store. Others need an easy, intent-based way to expire older data such as specifying a Time-To-Live (aka TTL). And last but not the least, for business-critical data sets, it is essential to have data protection/replication for disaster recovery and compliance requirements. The database tier should be capable of supporting these. YugaByte DB is a good option for some of these requirements.

7 – Data Infrastructure Management:

The number of deployed devices and the ingest rate can vary rapidly, requiring the data processing tier and the database to scale out (or shrink) reliably and efficiently. Orchestration systems such as Kubernetes and Mesos are great choices for automating deployment, management and scaling up and down of infrastructure as a function of business growth.