Last week, I attended the Open Science Grid All-hands meeting held on Lincoln, Nebraska. There, I had the opportunity to talk with José Caballero and he later sent me an e-mail where a couple of projects were described. I won't share the e-mail's content but a summary of my better understanding of the problems to be need addressed.
But what is grid computing? Well, grid computing is a large scale computational platform where resources are scattered geographically and shared by institutions around the globe following secure standardize mechanisms. Storage and processing are the resources more frequently shared by grid users. Links: www.gridcafe.org, www.globus.org.
Take care that I bolded three words: secure, institutions and processing. Lets see a detailed view of them
- Institutions. Computational resources are deployed or installed in different centers or buildings. People with proper rights could access these resources according to policies established by owners of those resources.
- Secure. Information and physical resources are shared in a secure way. Following well-known authentication mechanisms, users and resources are identified in such a way that secure interactions can be performed between them. http://gdp.globus.org/gt3-tutorial/multiplehtml/ch10s04.html
- Processing. Any resource which is attached to a computational device is a good candidate for being shared in a grid platform. In particular, some institutions have hundreds or even thousands of cores which are underutilized most of the time. Grid computing has defined and implemented protocolos to leverage the sharing of processing units amongst users belonging to digital communities, a.k.a. Virtual Organizations, VO. Check this picture.
What is the problem with job submission in a grid environment? It's not an easy task for end users
- A user should get grid certificates, X.509
- He/she needs to learn about basic basic command for interoperating with condor-based clusters.
- Mechanisms for prioritizing the jobs execution and resource brokering are many time hard-coded in the grid meta-schedulers.
A software solution to tackle all these problems is barely found. Some solutions partly solved this problem, e.g. glideinWMS (1st bullet) and PanDA (2nd and 3rd bullets). The main problem with PanDA is that it is highly attached to ATLAS. ATLAS is a USA computational infrastructure where different research projects are run. Different VOs need badly a software tool able to support all these features.
The main goal of this project is create a web application highly customizable where different features can be changed "on the fly", for instance:
- Scheduling algorithms
- Priority policies
The web application should employ any SQL database as back-end. That database stores the representation of a job. For instance, PanDA stores the following information to represent a job
- Binary filename. This filename is the actual program that should be remotely executed. This filename could be a full path directory or a URL.
- Argument list.
- Hash list. Some samples for keys in nodes in this hash list can be
- siteid, a name to represent a queue where this job can be queued
- queue, a compute element's name
Additional notes:
- The web application can run in a machine different from the machine where the job runs.
- cURL is the tool to provide secure interaction amongst the web application and the execution machine.
No comments:
Post a Comment