Inspired by a desire to learn more about Hadoop and the fact I already owned a Raspberry Pi I wondered whether anyone had yet built a Hadoop cluster based on this hobby computers. I wasn’t surprised to discover that people have already done this and the following instructions are the where I started:

Jonas’s instructions are based on Hadoop version 1.0 and Carsten’s is based on version 2.x If, like me, you’re interested in building with the newer version of Hadoop then follow Carsten’s instructions but read through Jonas’s too because he provides useful links for downloading the Raspian (Linux Operating System built specifically for the Raspberry Pi) distribution as well as commands and example files for testing your cluster.

The first stage is to build a single node cluster where your one node performs all tasks such as NameNode, Secondary NameNode and DataNode. Once you have this up-and-running you’re reading to add a second node. This second node will be a dedicated DataNode from which you will clone all subsequent DataNodes. Creating the second node is slightly more difficult which is why I decided to write this post in the hope that it will save others time and effort.

