Grid Programming Blog

Tuesday, March 15, 2011

Projects for Distributed Systems Lab

These projects are fundamental for creating a computational ecosystem in which vast majority of networking management tasks could be automatized. For portability, the services and tools must be deployed on virtual machines.

Network services

DHCP Server
DNS Server

Deployment services

Fedora mirror
Cobbler/koan

Services to enable remote access

Tools to support remote monitoring via remote terminal as well as browser interface
NOTE: wake-on LAN service would be enable in all machines at lab

Repository services

Git tool

Security tool

To search, evaluate and select a network intrusion detection tool

There would be two operating systems in the lab

CentOS would be the Linux distro in which cobbler/koan service would reside
Ubuntu server would be used to deploy DNS, DHCP and Git programs along with a network intrusion detection tool.

Repositories for both environments would be required. Only last releases would be maintained.

Thursday, March 10, 2011

Text-based network monitoring tools...

I want share some projects that I found very useful when you require to monitor you network activity. Those projects are not so much sophisticated but very useful for figuring out how and who is consuming your network bandwidth.

iptstate this program presents connections "observed" by iptables
speedometer is a python script to show in real-time how much bandwidth is consumed by a network interface. It discriminates between download and upload bandwidth consumption. How to use it: 'speedometer -rx <network-interface> -tx <network-interface>'
pktstat "displays a real-time list of active connections seen on a network interface, and how much bandwidth is being used". How to use it: 'pktstat -i <network-interface>'

<network-interface> could be wlan0, eth0, ppp0, and so on.

References

Link - 1

Wednesday, March 2, 2011

Booting order during the deployment process... [CETA-CIEMAT]

The booting order during the deployment of a computing ecosystem which provides grid computational services using the OCD-gLite script is first working nodes(WN) then computing element (CE).
That is it because access permissions with no password is required for job submission and those permissions are set up by CE and it requires that all working nodes be up and running.

Submitting jobs to maui

For submitting jobs from the computing element

Login as a user different of 'root'. In particular, when a computing element has been deployed with the OCD-gLite, it comes with a lot of user. Try 'ceta000'.
Create a basic bash script
#!/bin/bash
/bin/hostname
Change its permissions as follows "chmod +x script.sh"; assuming that the script has been named as 'script.sh'
Submit the job, executing 'qsub -q ceta script.sh'. '-q ceta' indicates that this job would be enqueued to one queue named 'ceta'.
If everything works as it is expected, a file called 'script.sh.o0' must contain something. ;-)

This notes were taken during my stay at CETA-CIEMAT and the information provided here is only relevant for usage scenarios experimented there.

Building dependencies for a Fedora package...

These days, I was "engaged" with a project named libguestfs. My box runs Fedora 13 and current version is Fedora 14. Therefore some packages are oldie and particular software requirements are necessary. For installing 'libguestfs' in my system, considering all the software ecosystem that it needs, this yum command is very useful.

# yum-builddep libguestfs

It, as its name suggests, prepares all the dependencies for the libguestfs package.

When you need to install a package which requires a lot of dependencies and you want avoid the cumbersome process of install one by one dependency, then try this command first.

Wednesday, February 23, 2011

Some additional web links relevant to Condor

An interesting link describing how to work with Java and Condor.
Different examples with their corresponding submission files.
Other interesting site describing how to use Condor.
Another interesting link for rookie Condor users.

Enabling the "Parallel" universe in Condor

If you have used our script (check the last reference) for deploying Condor then you need modify your local configuration files (condor_config.local) at master and worker nodes as follows:

Master:

UNUSED_CLAIM_TIMEOUT = 0
MPI_CONDOR_RSH_PATH = $(LIBEXEC)
ALTERNATE_STARTER_2 = $(SBIN)/condor_starter
STARTER_2_IS_DC = TRUE
SHADOW_MPI = $(SBIN)/condor_shadow

Worker Nodes:

DedicatedScheduler = "DedicatedScheduler@PASTE_SUBMIT_NODE_NAME_HERE"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
SUSPEND = False
CONTINUE = True
PREEMPT = False
KILL = False
WANT_SUSPEND = False
WANT_VACATE = False
RANK = Scheduler =?= $(DedicatedScheduler)
MPI_CONDOR_RSH_PATH = $(LIBEXEC)
CONDOR_SSHD = /usr/sbin/sshd
CONDOR_SSH_KEYGEN = /usr/bin/ssh-keygen
STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler

# Dedicated Node.

START=TRUE

In addition, I attach a submission file for running parallel jobs.

should_transfer_files = Yes
when_to_transfer_output = ON_EXIT_OR_EVICT
universe = parallel
executable = /bin/hostname
+ParallelShutdownPolicy = "WAIT_FOR_ALL"
machine_count = 3
log        = hostLog
Output     = hostOut.$(Node).$(Process)
Error      = hostErr.$(Node).$(Process)
queue

The boldfaced line forces that Condor will wait until every node in the parallel job has completed.

References