Somebody set up us the Beowulf

Recently we had an interesting opportunity to deploy 7 identical customized machines for one of Kosada’s consulting clients. We’ve been working on disk images to make this quick and painless, and have more or less succeeded. However, getting an archived image onto the machines has a few different methods, depending on circumstance. We also get to pay a penalty every time the underlying hardware changes, since the image bundles in specific drivers. Usually we’re able to work around this with minimal pain.

Excitingly, these new machines broke the mold (they’re slightly older, considerably cheaper machines), so we had to tweak the image a bit. This was thankfully quite painless. In a burst of excitement, we also decided that on the morrow we would take our working image, and use that as the source for the other machines, rather than using an archive from the e-mail/intranet server. By doing this, we could completely uncap transfer speeds on a tiny network (which we have up until now been heavily capping because saturating the network makes all the other network services shut down).

The reason why this is at all interesting or important? Folding@Home.

The expected deployment date for these new machines is not in the immediate future. Up until now we’ve usually just put undeployed machines under a desk to bide their time until they go live. With this new batch of machines though, we’d be crazy to not utilize this block of unused computing power for something altruistic.

The only tool kit we used is the SystemRescueCD.

First, we configured the network settings. This is easily accomplished using the well known tool ifconfig. Nothing fancy there.

Next, we configured the root password. Since this was a completely private network, we chose a weak password. Not a big deal. passwd is your friend.

Then, we started actually trying sftp, ftp, and ssh, and found that they’re all not running out of the box. A quick /usr/sbin/sshd reveals an interesting error: no key in /etc/ssh/ssh_host_rsa_key or /etc/ssh/ssh_host_dsa_key. To generate the keys, we did this: ssh-keygen -b 768 -f /etc/ssh/ssh_host_rsa_key -N '' and then ssh-keygen -b 768 -f /etc/ssh/ssh_host_dsa_key -N ''. Then a quick /usr/sbin/sshd and the image master was in business.

This revealed yet another problem, this time a conservative bug in sftp: you can’t get /dev/hda because it’s “not a regular file.” This posed quite a problem. However, we found that by repeating the steps above on each machine, we could have the image master “push” the image onto the others in this fashion: cat /dev/hda | ssh root@[ip address] cat > /dev/hda. This caused the image master to dump the harddrive, and then tunnel it to the others over ssh, where they would take the tunneled data and place it on /dev/hda, which is the harddrive.

In the process of imaging each of these machines, we were also able to test the efficacy of ssh’s -C compression switch. We discovered that while uncompressed machines ran at about 10-11 Megabytes per second (100Mbps limit), compressed streams ran at only 100 Kilobytes per second. We couldn’t really measure how much the data was compressed, but we were pretty sure compressed streams were being soundly outpaced by the uncompressed ones. We eventually threw our hands up in disgust and restarted the compressed stream test machine so that it would use uncompressed streams and finish before it was time to go retire.

This worked pretty well. It made our Linksys switches want to retire. No time profiling was done. We learned that this isn’t too useful out of the box without some tweaks, since intervention was required on every machine. But for a small cluster like this, it wasn’t a problem at all.