Monitoring in Grid
Monitoring in Grid
*
This work is supported in part by National Science Council, Taiwan (ROC), under grant no. NSC 96-2221-E-029-019-MY3 and
NSC 95-2218-E-007-025.
†
Corresponding author.
357
The software stack diagram of the system includes
three layers constructed of bottom up methodology,
such as bottom layer, middle layer, and top layer, the
sense of each layer are described in the following.
Bottom Layer: The principal part of this layer is
composed of Nodes, i.e., the node in Grid should be
constructed by software stack which is shown in Figure
2. This layer contains two main blocks, first is
Information Provider, which gathers machine
information of Nodes, such as the number of
processor/core, the load of processor, the free/total size Figure 3. The software stack of all Sites and
of memory, and the usage of disk, for the above- the Service
mentioned purposes the Ganglia serves as the Machine
Information Provider in this system. The part of
essence of Grid is connecting Nodes in Grid with
4. System implementation
Internet, hence the network information among Nodes
such as the bandwidth and latency is essential, and for 4.1. Information Aggregator
above purposes the NWS takes on the Network
Information Provider. The second block is Grid The main phases of Resource Broker are Resource
Middleware, used to join Grid Nodes together, and the Discovery, Application Modeling, Information
MPICH-g2 [4] that compatibles with GT is required Gathering, System Selection, and Job Execution [14].
for running parallel applications on the Grid. The subject matter of Information Gathering phase is
aggregating machine and network information for
Resource Broker making a suitable match of job and
resources. For above purposes, this work devises two
services called Information Service and Monitoring
Service, Information Service plays the role of
gathering the machine and network information and
store up into database, and Monitoring Service
provides a Web front-end page for users to observe the
variation during the process of jobs execution. Figure 4
Figure 2. The software stack of all Nodes depicts architectures of Information Service and
Monitoring Service, and their relation between
Middle Layer: The main composition of this layer is Resource Broker.
Site. The software stack diagram is shown in Figure 3. The primary purpose of Information Service is to
Each Site consists of several Nodes, which are located collect related resource information (processors,
in the same place or connected with same switch/hub, memory, disk, and network bandwidth) of all machines
each Node in a Site should connect to each other by in the Grid and provide the analyzed information.
Internet. Moreover, each Site usually is built up as a These components and their relations are described as
cluster and each Node has a real IP, and the first Node follows:
of this Site is called the head Node in this Site. The z Agent: It is the primary component of
construction of this layer is related to the domain-based Information Service, and is the contact window
network information model that will be described later. of Information Service. Either Scheduler of
Top Layer: The core component of this layer Resource Broker or Controller of Monitoring
consists of two blocks, Resource Broker and Service needs real-time information of machines
Monitoring Service, as shown in Figure 3. Moreover, or estimated information. For example, assume
the Monitoring Service provides a web front-end for that Resource Broker is requested by users for
users to observe the variation during the progress of the list of machines with low CPU loading. First,
jobs. Besides, users can specify the duration of Resource Broker sends a Request to Agent. After
particular Nodes or several particular links in a domain Agent receives the Request, it uses Getter and
which was developed based on Ganglia and NWS Setter to get required information, and returns it
tools. to Agent. Then, Agent sends it to Resource
Broker. After the task is finished, Resource
Broker delivers the related data during execution
to Agent, including number of used CPUs,
358
execution time, disk space usage, memory usage, z Displayer: This component is to provide a query
task requirement, etc. These historical data are mechanism for users to observe historical data of
stored in Message Center. Predictor will be able Grid Nodes. Therefore, a web interface must be
to analyze these data, and then report a suggested provided for convenient use. This component
machine list to Agent. will be integrated into Portal for users to query
z Gatherer and Setter: This component responds conveniently.
information collecting and data accessing could
occur at any time, so events of database
operations would be frequent. In order to unify
information access and reduce redundant
program development, Getter and Setter are
designed and placed at the front end of Message
Center, to control the access of Message Center.
z Message Center: This component is mainly used
to store native information from the Grid,
including CPU Load, Memory Free, Disk Usage,
Network Information, etc. In addition,
observation data of Job Execution and Prediction Figure 4. Architecture of Information Service
data analyzed by Predictor are included. and Monitoring Service
z Gather: MDS service of Globus could collect
resource information such as CPU speed, number The Ganglia is a scalable distributed monitoring
of CPUs, CPU loading, memory size, available system for monitoring status of host in cluster or Grids.
memory, disk space usage, and network interface It provides a PHP Web front-end for administrator to
information. The NWS tool is used to collect the view cluster or Grid status information in real time.
network bandwidth currently. Then, the Getter The default information includes some metrics, such as
and Setter component stores that information to processor load, memory usage, network (bytes
Message Center for future usages. input/output), and disk utilization. For all practical
z Predictor: This component has two functions. purposes, the administrator needs more flexible and
One is to periodically get native information variable operation provided by Web front-end. For this
from Message Center. By Modular design, purpose, this work developed a system that can satisfy
different Type of native information is adopted above needs and compatible with Ganglia. The main
by different prediction model. Then, they are steps are listed in the following:
stored in Message Center for future use. The 1. Dump the contents of a RRD file [17, 20] to
other is to accept Request of Agent to predict and XML format: The following shell script is used to
get required results, increasing system flexibility dump the contents of a RRD file to XML format.
and more applications.
The goal of Monitoring Service is to acquire the #!/bin/sh
information maintained by Information Service, and for i in `*.rrd`
do
present it in graphical form. The tasks and relations of rrdtool dump ${i} > ${i}.xml
these components are described as follows. done
z Controller: It is the main component of
Monitoring Service, and its primary task is to 2. Convert the XML output of an RRD file to
control the behavior of Monitoring Service, JRobin RrdDB format - RrdDb is a class of
including Grid Nodes configuration and JRobin, and it provides a constructer used to
parameter setting. Controller needs periodically create new RRD from XML dump. This class is
to get native information of Nodes from Agent of listed as belows.
Information Service. Then, it sends parameters
and data to Drawer for illustration. public static void xml2JRrd(String name) {
String xml = name + ".xml";
z Drawer: It receives parameters and data from String jrrd = name + ".jrrd";
Controller and draws these figures. Then, RrdDb db = new RrdDb(jrrd, xml);
Displayer presents the figures. The functions of db.close();
drawing need to be flexible. It has to draw }
appropriate figures according to information 3. Render the graph of JRobin RrdDB by
types. RrdGraphDef: The following codes show an
example of rendering an image from JRobin
359
RrdDB that contains processor load information /* do render graph */
RrdGraph rrdGraph = new RrdGraph(def);
and the output graph of CPU loading is shown in }
Figure 5.
/* start of RrdGraphDef */
RrdGraphDef def = new RrdGraphDef();
/* definition of graph */
def.setMaxValue(100);
def.setMinValue(0); Figure 5. A CPU load visual graph of Node
def.setRigid(true); gamma2
def.setVerticalLabel("Percent");
def.setTimeSpan(start_time, end_time);
def.setTitle(hostname + " CPU last " + Furthermore, this work developed a system that can
getRange(end_time - start_time)); satisfy above needs and compatible with NWS for
def.setAntiAliasing(true); extracting network bandwidth. The main steps of
def.setFilename(img_file); Rendering Network Information Graph with JRobin
def.setWidth(WIDTH);
def.setHeight(HEIGHT); are listed in the following:
def.setLazy(LAZY); 1. Create a JRobin RrdDB for a domain: Each
JRobin RrdDB file responses for a domain and
/* definition of datasource */ each JRobin RrdDB file using constructor to
def.datasource("cpu_user",
rd + "/cpu_user.rrd.jrrd", "sum", create new RRD object from the definition.
"AVERAGE");
def.datasource("cpu_nice", public void addRrd(String domainname,
rd + "/cpu_nice.rrd.jrrd", "sum", String[] links) {
"AVERAGE"); String jrrd = domainname + ".jrrd";
def.datasource("cpu_system", String head = null;
rd + "/cpu_system.rrd.jrrd", "sum", String tail = null;
"AVERAGE");
def.datasource("cpu_wio", RrdDef def = new
rd + "/cpu_wio.rrd.jrrd", "sum", RrdDef(this.nwsrrds_root + "/" + jrrd);
"AVERAGE"); def.setStep(this.step);
def.datasource("cpu_idle",
rd + "/cpu_idle.rrd.jrrd", "sum", for (int i = 0; i < links.length; i++) {
"AVERAGE"); head = links[i].split(" ")[0];
tail = links[i].split(" ")[1];
def.area("cpu_user", CPU_USER, "User def.addDatasource(
CPU"); head + "." + tail + "_b", "GAUGE",
def.gprint("cpu_user", "AVERAGE", " 600, 0.0, Double.NaN);
avg: %6.2f\\l"); def.addDatasource(
def.stack("cpu_nice", CPU_NICE, "Nice head + "." + tail + "_l", "GAUGE",
CPU"); 600, 0.0, Double.NaN);
def.gprint("cpu_nice", "AVERAGE", " }
avg: %6.2f\\l"); def.addArchive("MIN", 0.5, 1, 603);
def.stack("cpu_system", CPU_SYSTEM, def.addArchive("MIN", 0.5, 6, 603);
"System CPU"); def.addArchive("MIN", 0.5, 24, 603);
def.gprint("cpu_system", "AVERAGE", def.addArchive("MIN", 0.5, 288, 800);
"avg: %6.2f\\l");
def.stack("cpu_wio", CPU_WIO, "Wait def.addArchive("AVERAGE", 0.5, 1, 603);
CPU"); def.addArchive("AVERAGE", 0.5, 6, 603);
def.gprint("cpu_wio", "AVERAGE", " avg: def.addArchive("AVERAGE", 0.5, 24, 603);
%6.2f\\l"); def.addArchive("AVERAGE", 0.5, 288,
def.stack("cpu_idle", CPU_IDLE, "Idle 800);
CPU");
def.gprint("cpu_idle", "AVERAGE", " def.addArchive("MAX", 0.5, 1, 603);
avg: %6.2f\\l"); def.addArchive("MAX", 0.5, 6, 603);
def.comment("- " + new Date() + " - def.addArchive("MAX", 0.5, 24, 603);
\\r"); def.addArchive("MAX", 0.5, 288, 800);
360
}
2. Query measurement from NWS and update
JRobin RrdDB file: The detail codes are listed as public static void netReport(String
follows. rrd_file, String domain,
long start_time, long end_time, String
public void updateDB(String[] members, img_file) {
NwsJRrd jrrd) {
// links in a domain /* start of RrdGraphDef */
String[] links = this.getLinks(members); RrdGraphDef def = new RrdGraphDef();
// valid members in a domain
String[] mbrs = /* definition of graph */
this.updateMembers(members); def.setMinValue(0);
def.setRigid(true);
// query string
String[] q = this.getFilename(mbrs); def.setVerticalLabel("(Mbit/second)");
String[] result = null; def.setTimeSpan(start_time, end_time);
String[] ss = null; def.setTitle(domain.replace(".", "-") + "
long now = Util.getTime(); Network last "
long step = 0L; + getRange(end_time - start_time));
long stamp = 0L; def.setAntiAliasing(true);
double value = 0L; def.setFilename(img_file);
Vector<Double> rec = new def.setWidth(WIDTH);
Vector<Double>(); def.setHeight(HEIGHT);
361
Figure 6. A monthly network graph of beta-
domain Figure 7. The enhanced design of previous
work
5. Experimental results This subsection describes experimental results of
network information model (NIM) and dynamic
The previous work reduced the number of network information model (DNIM). Two clusters, eta
bandwidth measurement between all Grid Nodes, but it and beta, are used in this experiment. We transfer a
lacks network information between two Nodes other 5GB file from eta1 to beta1 during time period
than the head Node located in two different Sites other between the 20th until 30th timestamp, and the
than the head Node. For example, the bandwidth
bandwidth between eta2 and beta1 are observed in
measurement between Nodes A2 and B3 not performed every 60 seconds. Figure 8 depicts that the bandwidth
in the previous model.
of the connection from eta2 to beta1 obtained by NIM
For solving the above need, this work enhanced the is a smooth curve, which cannot reflect the actual
previous model by increasing a switching mechanism. situation. But DNIM can present the variation of the
We call it the dynamic domain-based network link. Figure 9 depicts an unstable fluctuation of the
information model which is shown in Figure 7. The
error rate of NIM, providing broker an unstable
principal improvement is switching the head Node to information reference and causing broker to make
the next free Node of a Site. For example, when Node wrong decisions.
A1 is busy, the head Node of Site A will be the next
free Node A2, which will conduct the bandwidth
measurement between itself and Nodes B3, C2, and D4, 6. Conclusions
if they are the free Nodes of their own Site respectively.
There are three obvious advantages in using this model. This paper is presented to help the user make better
z First, the number of bandwidth measurement can
use of the grid resources available. This paper will look
be still reduced in the same as the previous static at the use of information services in a grid and discuss
model. the monitoring use of the Ganglia toolkit to enhance
z Second, the bandwidth measurement between
the information services already present in the Globus
two arbitrary Nodes in two different Sites can environment. Our grid resource brokerage system
obtain easily. discover and evaluate grid resources, and make
z Finally, the bandwidth measurement obtains real
informed job submission decisions by matching a job’s
values instead of estimation values of a network. requirements with an appropriate grid resource to meet
That is, the Resource Broker is useful in budget and deadline requirements.
scheduling jobs with multi-site condition.
362
DNIM v.s. NIM eta2 -> beta1 eta2 -> beta1 (NIM) [6] V. Laszewski, I. Foster, J. Gawor, and P. Lane, “A Java
80 commodity grid kit,” Concurrency and Computation:
70
Practice and Experience, 2001, vol. 13, pp. 645-662.
[7] H. Le, P. Coddington, and A.L. Wendelborn, “A Data-
60
Aware Resource Broker for Data Grids,” IFIP
50 International Conference on Network and Parallel
Mb/s
200
PDCAT’05, pp. 518-520, Dec. 2005.
150
[11] J. Nabrzyski, J.M. Schopf, and J. Weglarz, Grid
100
Rrsource Management, Kluwer Academic Publishers,
50
2005.
[12] S.M. Park and J.H. Kim, “Chameleon: A Resource
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Scheduler in a Data Grid Environment,” Proceedings of
rd
Time (minute/unit)
the 3 IEEE/ACM International Symposium on Cluster
Computing and the Grid, pp. 258-265, May 2003.
Figure 9. Error rate of NIM worse to 428.37% [13] C.T. Yang, C.L. Lai, P.C. Shih, and K.C. Li, “A
Resource Broker for Computing Nodes Selection in Grid
Environments,” Grid and Cooperative Computing - GCC
References rd
2004: 3 International Conference,, Lecture Notes in
Computer Science, Springer-Verlag, vol. 3251, pp. 931-
[1] K. Czajkowski, S. Fitzgerald, I. Foster, and C. 934, Oct. 2004.
Kesselman, “Grid Information Services for Distributed [14] C.T. Yang, P.C Shih, S.Y. Chen, and W.C. Shih, “An
Resource Sharing,” Proceedings of the Tenth IEEE Efficient Network Information Modeling using NWS for
International Symposium on High-Performance Grid Computing Environments,” Grid and Cooperative
th
Distributed Computing, IEEE press, 2001. Computing - GCC 2005: 4 International Conference,
[2] I. Foster and C. Kesselman, “The Grid 2: Blueprint for a Lecture Notes in Computer Science, vol. 3795, pp. 287-
New Computing Infrastructure,” Morgan Kaufmann, 2
nd
299, Springer-Verlag, Nov. 2005.
edition, 2003. [15] C.T. Yang, C.F. Lin, and S.Y. Chen, “A Workflow-
[3] I. Foster, “The Grid: A New Infrastructure for 21st based Computational Resource Broker with Information
th
Century Science,” Physics Today, 2002, vol. 55, no. 2, pp. Monitoring in Grids,” Proceedings of the 5 International
42-47. Conference on Grid and Cooperative Computing (GCC
[4] I. Foster and N. Karonis, “A Grid-Enabled MPI: Message 2006), IEEE CS Press, pp. 199-206, China, Oct. 2006.
Passing in Heterogeneous Distributed Computing [16] Ganglia, http://ganglia.sourceforge.net/
Systems,” Proceedings of 1998 Supercomputing [17] JRobin, http://www.jrobin.org/
Conference, 1998. [18] Network Weather Service, http://nws.cs.ucsb.edu/ewiki/
[5] I. Foster and C. Kesselman, “Globus: A Metacomputing [19] TIGER, http://gamma2.hpc.csie.thu.edu.tw/ganglia/
Infrastructure Toolkit,” International Journal of [20] Tomcat, http://tomcat.apache.org/
Supercomputer Applications, 1997, vol. 11, no. 2, pp. [21] UniGrid, http://140.114.91.31/ganglia/
115-128.
363