NetWinder Beowulf Cluster


Objective

To lower the compile time of the Linux kernel for the NetWinder.

Requirements

The requirements are (in no particular order):

It is important to note that, unless I specify otherwise, I am concentrating on compiling the Linux kernel. Any blanket statements I make about optimal numbers of nodes and timings may not be transferable to other programs. For example, for XEmacs the more nodes the better assuming that you have the bandwidth.

The Hardware

While the cluster can contain any number of NetWinders, my test cluster consists of a full blown NetWinder DM with 64M of memory and a 4G harddisk as the head unit and 4 diskless NetWinders with 32M of memory each as the clients. There is no swap on the clients. Memory usage tends to peak at 27M for the kernel.

All the NetWinders are connected via a 5 port 10Mbs hub. With a 10Mbs hub, the cluster is limited to about 5 clients. Above 5 the compile time actually goes up due to lack of bandwidth on the hub. So a 5 port hub with 4 clients works well. With only 4 clients, switching to a 100Mbs hub does not improve performance.

Topography of cluster

I choose diskless clients because they require less power (and generate less heat) and for ease of maintenance. Only two versions of the code must be kept in sync and it is very easy to add/remove clients.

The Software

Linux

All the NetWinders are running Linux with a 2.2.13 kernel. The clients tftp boot the kernel and mount the file systems over NFS.

MMM

A quick perusal of GNU make shows that it contains hooks for `remote' access that is perfect for the cluster. Basically, whenever it finds "work" to do (e.g. a make, a compile, a link) it checks to see if it should be done "remotely". Note that in this context remote only means "outside of GNU make".

With the use of the `-jN' option to GNU make and the mmm package, you can spread the load out over multiple machines.

MMM uses a client/server model. A server process (mmserver) runs on the head unit that also starts the makes. A client process (mmclient) runs on the other clients in the cluster. A client process can also run on the head unit, but at 4 clients this tends to hurt performance.

GNU make must be recompiled to include the code to talk to the mmserver and mmclients. All communication is done through sockets. The following figure shows the socket communication.

Awesome message passing diagram that probably
			only makes sense to me.

Limitations of mmm:

NFS

For the remote make to work, the remote machines must somehow have access to the same directory structure and files as the head machine. This is very important as the directory structure must match exactly since the commands are passed to the remote machines with the directory name.

I choose to do use NFS since it comes with Linux and is very simple to setup. Since I compile from the /home directory, I put the /home directory in its own partition to make it easy to mount on the remote clients.

XNTP

For make to work properly, all the clients must be in sync with the head. I choose XNTP to keep the clients in sync since it comes standard on the NetWinders and is easy to configure in this simple application.

Compiling the Kernel

You have mmserver running on a machine and mmclients happily running on your cluster, now what? There are many approaches to using `make -jN'. The first is the straight forward single directory build. Just use `make -jN' where N is the number of mmclients running. For a slightly more compilcated expample of a main directory containing subdirs (such as xemacs), add the -jN to the subdirectory rule in the top level Makefile.

For the kernel, for a small number of clients, I recommend the simple 'make -jN' approach. This will get your compiles down to the 8 minute range for a compile. You will not get below 8 minutes for this configuration since that is the time for the longest make (the net directory).

I also move around the order of the SUBDIRS line in the top level Makefile.

    SUBDIRS		=kernel net fs drivers mm ipc lib
    

This helps speed up the compiles since the net and fs directories take the longest to compile and should be started first. The table below shows the times to compile the individual directories. On the NetWinder most drivers are modules and therefore do not show up in the time for the drivers directory below.

DirectoryTime DirectoryTime DirectoryTime
net07:45 mm01:11 arch/arm/mm00:21
fs05:54 arch/arm/kernel01:09 ipc00:19
drivers04:11 arch/arm/nwfpe00:45 arch/arm/lib00:06
kernel01:26 arch/arm/special00:33 lib00:04

From the table we can also see that using the top-level `make -jN' scheme the best we can do with the Linux kernel is about 8 minutes.

Compile Times

The following table shows the times taken to compile the linux kernel with various numbers of clients. Note that the time peaks out at just over 8 minutes as we expected.

Before taking the times, I checked out a fresh version of armlinux version 2.2.13. I then performed the following steps:

  cd armlinux
  cp arch/arm/def-configs/netwinder .config
  make oldconfig
  make dep
  edit Makefile and change the SUBDIRS line as mentioned above
  make -j4
    

Now that everything is clean and I know I have a kernel that compiles, I perform the following steps for each number of clients N. To be 100% correct, I should have done `time make -jN Image'.

  make clean
  time make -jN
    
ClientsTime
base23:47
212:08
308:46
408:20

Sean MacLennan
Last modified: Wed Dec 6 19:02:50 EST 2000



Made with XEmacs Valid
	  HTML 3.2!