HOWTO: Getting BOINC going on an cluster of Ubuntu machines with a custom algorithm.

I am doing some research that is applying some peer-to-peer (P2P) techniques to distributed computing. One assertion central to my thesis is that a P2P swarm of processing nodes can perform as well as a client server solution. In order for me to defend this claim, I need to pit my developed system against some other existing distributed computing systems. BOINC is one of the chosen candidates to get some data from when running a framework of various benchmarks.

 

Now BOINC can be a bit tricky to set up. And it took me some time to get running properly, so here are the notes that I took along the way which may help others embarking on a similar task.

 

There are a few shortcuts I take here because these machines are hosted on a private network. Someone wanting to set up a public BOINC project may not really want to omit some of the security steps that I have. Please consult the official BOINC wiki for security suggestions.

 

The following set of instructions shows how to get a non-BOINC executable running within the BOINC system and producing results. It uses the “wrapper” example from BOINC to achieve this. This is good for if you want to use an existing algorithm and not write a “proper” BOINC algorithm with proper checkpointing, graphical stuff, etc.

 

The algorithm I have written is a Mandelbrot set generator that takes two command line arguments. 1. An input text file containing the settings for the part of the Mandelbrot set it is going to generate and 2. the name of an output file to put the resulting jpeg. There are a lot of other non-generic things in these instructions like the ip address of my machine and the name of my user account etc. Obviously you’ll need to adjust these.

 

These instructions work on a server machine (a nothing fancy p3 machine running Ubuntu 6.06) with a full complement of development tools with Apache2, MySQL and PHP installed and working correctly.

 

Instructions:

 

Check out the code:
svn co http://boinc.berkeley.edu/svn/trunk/boinc
Revision 14015 was the latest revision at time of experimentation.

 

Build the code:
./_autosetup ./configure --disable-client make

 

Notes on earlier versions of Ubuntu: You may need to upgrade automake etc to get it to configure properly. There are instructions elsewhere on this blog for upgrading automake to the latest version for older Ubuntu boxes.

 

Create the Project:
cd /data/source/boinc/tools/ ./make_project --url_base http://144.6.40.251/ --db_host localhost --db_user root mandelbrot16

 

Note: The url_base is just that. The base url for which the project will be placed into a directory underneath. So the command above will produce a project URL of http://144.6.40.251/mandelbrot16/

 

Set up the Project Website:

 

sudo cp mandelbrot16.httpd.conf /etc/apache2/sites-available/ sudo ln -s /etc/apache2/sites-available/mandelbrot16.httpd.conf /etc/apache2/sites-enabled/mandelbrot16.httpd.conf sudo apache2ctl restart cd ~/projects/mandelbrot16 sudo chown www-data ~/projects/ -R

 

Note: this is for Ubuntu remember. These instructions won’t work if your Linux distribution has different group names for apache.

 

Enable Account Creation:

 

Enable account creation – edit config.xml in the ~/project/mandelbrot16/ directory and set the the disable_account_creation element to 0

 

Edit the Project:

 

In this example, we’re just using linux machines, so par down the project xml to only include one target platform.

 

Also, add in the name of the algorithm executable. In this case its mandelbrot.

 

<boinc> <platform> <name>i686-pc-linux-gnu</name> <user_friendly_name>Linux running on an Intel x86-compatible CPU</user_friendly_name> </platform> <app> <name>mandelbrot</name> <user_friendly_name>mandelbrot16</user_friendly_name> </app> </boinc>
Edit config.xml:

 

Make sure the db_passwd element contains your db password

 

Get the code for Wrapper and Compile It:

 

You can’t just run any old executable in BOINC and expect it to work (I know, I tried). BOINC algorithm’s need to work with the BOINC framework and link to BOINC libraries etc. To get around this, there is an example program called wrapper which allows you to run an arbitrary executable. This code is not within the main source tree - its in samples.

 

svn co http://boinc.berkeley.edu/svn/trunk/boinc_samples cd boinc_samples/wrapper ln -s `g++ -print-file-name=libstdc++.a` make cp boinc_samples/wrapper/wrapper ~/projects/mandelbrot16/apps/mandelbrot/wrapper_5.5_i686-pc-linux-gnu

 

Copy the Algorithm:

 

Also make sure to name your algorithm as per BOINC’s naming convention so that it matches the target platforms set out in the project.xml.

 

mkdir ~/projects/mandelbrot16/apps/mandelbrot/ cp /data/source/sp2p/experiments/mandelbrot/160/comptorrents/working/mandelbrot/mandelbrot ~/projects/mandelbrot16/apps/mandelbrot_5.5_i686-pc-linux-gnu

 

bin/xadd
bin/update_versions

 


Add Templates for Work Units and Results:

 

Put the following into a file called result_template in ~/projects/mandelbrot16/templates/
<file_info> <name><OUTFILE_0/></name> <generated_locally/> <upload_when_present/> <max_nbytes>100000000</max_nbytes> <url><UPLOAD_URL/></url> </file_info> <result> <file_ref> <file_name><OUTFILE_0/></file_name> <open_name>out</open_name> <copy_file/> </file_ref> </result>
Put the following into a file called work_unit_template in ~/projects/mandelbrot16/templates/

 

<file_info> <number>0</number> </file_info> <workunit> <file_ref> <file_number>0</file_number> <open_name>in</open_name> <copy_file/> </file_ref> <delay_bound>6000000</delay_bound> <rsc_fpops_bound>9999999999999999999999999999999999999999999999999999</rsc_fpops_bound> <rsc_fpops_est>9999999999999999999999999999999999999999999999999999</rsc_fpops_bound> </workunit>

 

Note: You may not want your algorithm to be allowed quite this long to execute. You may want to consult the BOINC documentation for the delay_bound, rsc_fpops_bound and rsc_fpops_est options and then make an informed judgement as to their appropriate value. As for me, I just want it to do the work and take as long as it wants. My situation here is that I have full control over all of the machines in my cluster.This is not the common situation with BOINC projects - so please adjust these variables for a “real” project.
Add Work Units

 

The mandelbrot executable takes 2 command line arguments: an infile and an outfile.

 

The infile stipulates the parameters for the mandelbrot set being generated.

 

The outfile is a resulting jpeg.

 

For this example, there are 16 work units, each a file containing the settings for the overall Mandelbrot set as well as the region of the set to be calculated (so it can be split up and computed over multiple machines).

 

Copy each of these files into the ~/projects/mandelbrot16/download directory.

 

Now, tell boinc about each one of these files by inserting each one of them as a work unit (using the templates from the previous step).

 

bin/create_work -appname mandelbrot -wu_name mandelbrot_00000001 -wu_template templates/work_unit_template -result_template templates/result_template -min_quorum 1 -target_nresults 1 mandelbrot_00000001

 

Do this changing the file and work unit name for each data set file eg. mandelbrot_00000002, mandelbrot_00000003 and so on.

 

Start Boinc:

 

sudo bin/start

 

You can also stop it (sudo bin/stop) and get a status (sudo bin/status)

 

Create a Client User Account:

 

Go to the project homepage (http://144.6.40.251/mandelbrot16/) and create an account.

 

If the server does not have public Internet access, as is often the case on a dedicated cluster, this page might take a while to load as it tries to get user info from the boinc main site. You can hack this out editing the file user.inc (home/bcg/projects/test1/html/inc/user.inc). Modify the function get_other_projects at line 60 to return $user and do nothing else eg:

 

function get_other_projects($user) { /* $cpid = md5($user->cross_project_id . $user->email_addr); $url = "http://boinc.netsoft-online.com/get_user.php?cpid=$cpid"; $f = fopen($url, "r"); if (!$f) { return $user; } $u = parse_user($f, $user); fclose($f); return $u; */ return $user; }

 

Since this is a dedicated cluster example, you like me will probably not bother setting up email on the servers. You need to edit the created user’s database entry and also get the authentication key so command line clients can attach to the project. You don’t need to do this if your client machines are going to run the BOINC GUI client. In my case I am controlling 16 machines at the same time using cssh, so I need to use the boinc command line tools. At this stage, I do not know how to connect to them using a username and password like the GUI client does. If anyone does know how to do this, please leave a comment on this post!

 

Connect to the database using phpmyadmin or similar. Go to the mandelbrot16 database and browse the results in the user table.

 

Take note of the authenticator field value (eg. 84a35ba7615192bd3120019d8861ffac). You will need to to connect shortly.

 

Update the email_validated field to contain a 1.

 

Setup Client Software:

 

Download boinc_5.10.21_i686-pc-linux-gnu.sh from the BOINC website.

 

Run it on the client machine from your home directory. This will install the runtime files into a BOINC directory.

 

In a terminal run:

 

./run_client

 

Attach to the Project

 

In another separate terminal window:

 

./boinc_cmd –project_attach http://144.6.40.251/mandelbrot16/ cc5e5948e2ff9fe00dc2474d271753ad

 

Use your own authenticator at this point that you noted down earlier.

 

You can also detach from the project later on by running:

 

./boinc_cmd –project http://144.6.40.251/mandelbrot16/mandelbrot16/ detach

 

That should be it! You should be able to go to your ops page for your project (in this case http://144.6.40.251/mandelbrot16_ops/) and see results coming in.

 

I am by no means a BOINC expert so if you find any better ways of doing what I have done please make a comment or drop me a line and let me know.

 

Coming Soon: A set of instructions to do exactly the same thing with a Beowulf cluster.

 

REFERENCES WHICH HELPED ME:

 

Building distributed applications with BOINC (PDF) (HIGHLY RECOMMENDED READING FOR BEGINNERS)

 

BOINC Project Creation Cookbook

4 Responses to “HOWTO: Getting BOINC going on an cluster of Ubuntu machines with a custom algorithm.”

  1. Bastian Says:

    Hi this sounds very interesting I will give it a try. Did you check about the reliability of boinc and the local cluster network.

    greetings Bastian

  2. Huiping Yao Says:

    Hi
    when I used wrapper/worker, there comes a problem.After executing bin/status finding that Daemon sample_worker_generator is not running.
    and the messages in sample_work_generator.log is as follows:
    [2008/07/11 21:35:01] Executing command: sample_work_generator -d 3
    Unrecognized XML in SCHED_CONFIG::parse: cgi_url
    Skipping: http://192.168.151.75/boinc080710_cgi/
    Skipping: /cgi_url
    Unrecognized XML in SCHED_CONFIG::parse: disable_account_creation
    Skipping: 0
    Skipping: /disable_account_creation
    Unrecognized XML in SCHED_CONFIG::parse: log_dir
    Skipping: /var/www/boinc_projects/log_debian
    Skipping: /log_dir
    Unrecognized XML in SCHED_CONFIG::parse: app_dir
    Skipping: /var/www/boinc_projects/apps
    Skipping: /app_dir
    Unrecognized XML in SCHED_CONFIG::parse: host
    Skipping: debian
    Skipping: /host
    Unrecognized XML in SCHED_CONFIG::parse: show_results
    Skipping: 1
    Skipping: /show_results
    2008-07-11 21:35:02.2927 [normal ] Starting
    2008-07-11 21:35:02.2928 [debug ] Making 49 jobs
    Too few input files given; need at least 2
    process_wu_template: -112
    2008-07-11 21:35:02.2932 [CRITICAL] can’t make job: -112

    is it because the sample_work_generator.c which is suitable for uppercase? Or anyther reasons?

    Any idea is appreciated.

  3. bcg Says:

    Best thing you can do is join the BOINC mailing list and ask over there. They are more up-to-date with latest versions of the software.

    However, all the unrecognised XML errors suggest there is a problem there. It seems your xml is not correct for what the wrapper is expecting. Compare your files to mine and see if you have missed some parameters or misspelt something.

  4. Huiping Says:

    hi,

    The problem I asked last time has been solved,and the wrapper/worker is running normally.
    I wanna ask you a question about creating workunits by program. If the project need the wrapper to call legacy program, after I used ./make_project and modified the sample_work_generator.C ,What should I do next?

    Any idea is appreciated.

Leave a Reply