Google: native C in Hadoop with MapReduce for C

Google’s open source elves have released MapReduce for C (MR4C), an open source framework to run native C-language code in Hadoop.



MR4C is an open source software “framework” designed to allow software programmers to run their native C and C++ code on Hadoop for its big data analytics capabilities.

MR4C is an implementation framework that allows programmers to run native code within the Hadoop execution framework.

Pairing the performance and flexibility of natively developed algorithms with the “unfettered scalability and throughput” inherent in Hadoop, MR4C enables large-scale deployment of advanced data processing applications.

MapReduce itself can be described as a “programming model” with a parallel distributed algorithm for creating software code that is capable of processing and performing calculations upon (and then ultimately also generating) what are very large data sets.


Being able to run native code means developers can avoid having to construct additional libraries — and Hadoop is of course written in Java.

Still … why?

What kind of code would be so big that this type of set of algorithms and libraries would need to be developed?

Examples include:

• High performance scientific computing

• Satellite image processing

• Industrial data clusters serving Internet of Things

• Geospatial data science

… and finally, why?

What Google wants to do is be able to abstract the details of MapReduce (as a programming model and framework) and so therefore allow developers to create more pure-bred algorithms, which (in theory) will always perform with more power, flexibility and speed.

Google explains that it was attracted by the job tracking and cluster management capabilities of Hadoop for scalable data handling, but wanted to leverage the image processing libraries that have been developed in C and C++.

“While many software companies that deal with large datasets have built proprietary systems to execute native code in MapReduce frameworks, MR4C represents a flexible solution in this space for use and development by the open source community,” said Google, on its open source blog.

Blogger Ty Kennedy-Bowdoin continues, “MR4C is developed around a few simple concepts that facilitate moving your native code to Hadoop. Algorithms are stored in native shared objects that access data from the local filesystem or any uniform resource identifier (URI), while input/output datasets, runtime parameters, and any external libraries are configured using JavaScript Object Notation (JSON) files.”

Start the conversation

Send me notifications when other members comment.

Please create a username to comment.