0% found this document useful (0 votes)
2 views

.NET Dynamic Software Load Balancing

.NET Dynamic Software Load Balancing is an article by Stoyan Damov that presents a custom solution for dynamic load balancing implemented in .NET, focusing on balancing web servers in a web farm. The article outlines the architecture, implementation details, and the methodology for calculating machine load using performance counters. It emphasizes the importance of load balancing for mission-critical applications and provides insights into the software's components and their collaboration.

Uploaded by

kate.moss
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

.NET Dynamic Software Load Balancing

.NET Dynamic Software Load Balancing is an article by Stoyan Damov that presents a custom solution for dynamic load balancing implemented in .NET, focusing on balancing web servers in a web farm. The article outlines the architecture, implementation details, and the methodology for calculating machine load using performance counters. It emphasizes the importance of load balancing for mission-critical applications and provides insights into the software's components and their collaboration.

Uploaded by

kate.moss
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 56

https://www.codeproject.

com/Articles/3338/NET-Dynamic-Software-Load-Balancing

.NET Dynamic Software Load Balancing

Stoyan Damov

0.00/5 (No votes)

9 Dec 2002 1
A Draft Implementation of an Idea for .NET Dynamic Software Load Balancing
Download source code (zipped) - ~100 KB
Latest source code and documentation (would be) available here soon.

In this article
1. Introduction
a. A Teeny-Weeny Intro to Clustering and Load Balancing
b. My Idea for Dynamic Software Load Balancing
2. Architecture and Implementation
a. Architecture Outlook
b. Assemblies & Types
c. Collaboration
d. Some Implementation Details
3. Load Balancing in Action - Balancing a Web Farm
4. Building, Configuring and Deploying the Solution
a. Configuration
 What's so common in Common.h?
 Tweaking the configuration file
 ... and the other configuration file:)
b. Deployment
5. Some thoughts about MC++ and C#
a. Managed C++ to C# Translation
b. C#'s readonly fields vs MC++ non-static const members
6. "Bugs suck. Period."
7. TODO(s)
8. Conclusion
a. A (final) word about C#
9. Disclaimer

"Success is the ability to go from one failure to another with no loss of enthusiasm."
Winston Churchill

Introduction

<blog date="2002-12-05"> Yay! I passed 70-320 today and I'm now MCAD.NET. Expect the next article to
cover XML Web Services, Remoting, or Serviced Components:) </blog>

This article is about Load Balancing. Neither "Unleashed", nor "Defined" -- "Implemented":) I'm not going to
discuss in details what load balancing is, its different types, or the variety of load balancing algorithms. I'm not
going to talk about proprieatary software like WLBS, MSCS, COM+ Load Balancing or Application Center
either. What I am going to do in this article, is present you a custom .NET Dynamic Software Load Balancing
solution, that I've implemented in less than a week and the issues I had to resolve to make it work. Though the
source code is only about 4 KLOC, by the end of this article, you'll see, that the solution is good enough to
balance the load of the web servers in a web farm. Enjoy reading...

Everyone can read this article

...but not everybody would understand everything. To read, and understand the article, you're expected to
know what load balancing is in general, but even if you don't, I'll explain it shortly -- so keep reading. And to
read the code, you should have some experience with multithreading and network programming (TCP, UDP
and multicasting) and a basic knowledge of .NET Remoting. Contrarily of what C# developers think, you
shouldn't know Managed C++ to read the code. When you're writing managed-only code, C# and MC++
source code looks almost the same with very few differences, so I have even included a section for C#
developers which explains how to convert (most of the) MC++ code to C#.

I final warning, before you focus on the article -- I'm not a professional writer, I'm just a dev, so don't expect too
much from me (that's my 3rd article). If you feel that you don't understand something, that's probably because
I'm not a native English speaker (I'm Bulgarian), so I haven't been able to express what I have been thinking
about. If you find a grammatical nonsense, or even a typo, report it to me as a bug and I'll be more than glad to
"fix" it. And thanks for bearing this paragraph!

A Teeny-Weeny Intro to Clustering and Load Balancing

For those who don't have a clue what Load Balancing means, I'm about to give a short explanation of
clustering and load balancing. Very short indeed, because I lack the time to write more about it, and because I
don't want to waste the space of the article with arid text. You're reading an article at www.CodeProject.com,
not at www.ArticleProject.com:) The enlightened may skip the following paragraph, and I encourage the rest to
read it.

Mission-critical applications must run 24x7, and networks need to be able to scale performance to handle large
volumes of client requests without unwanted delays. A "server cluster" is a group of independent servers
managed as a single system for higher availability, easier manageability, and greater scalability. It consists of
two or more servers connected by a network, and a cluster management software, such as WLBS, MSCS or
Application Center. The software provides services such as failure detection, recovery, load balancing, and the
ability to manage the servers as a single system. Load balancing is a technique that allows the performance of
a server-based program, such as a Web server, to be scaled by distributing its client requests across multiple
servers within a cluster of computers. Load balancing is used to enhance scalability, which boosts throughput
while keeping response times low.

I should warn you that I haven't implemented a complete clustering software, but only the load balancing part
of it, so don't expect anything more than that. Now that you have an idea what load balancing is, I'm sure you
don't know what is my idea for its implementation. So keep reading...

My Idea for Dynamic Software Load Balancing

How do we know that a machine is busy? When we feel that our machine is getting very slow, we launch the
Task Manager and look for a hung instance of iexplore.exe:) Seriously, we look at the CPU utilization. If it is
low, then the memory is low, and disk must be trashing. If we suspect anything else to be the reason, we run
the System Monitor and add some performance counters to look at. Well, this works if you're around the
machine and if you have one or two machines to monitor. When you have more machines you'll have to hire a
person, and buy him a 20-dioptre glasses to stare at all machines' System Monitor consoles and go crazy in
about a week :). But even if you could monitor your machines constantly you can't distribute their workload
manually, could you? Well, you could use some expensive software to balance their load, but I assure you that
you can do it yourself and that's what this article is all about. While you are able to "see" the performance
counters, you can also collect their values programmatically. And I think that if we combine some of them in a
certain way, and do some calculations, they could give you a value, that could be used to determine the
machine's load. Let's check if that's possible!

Let's monitor the \\Processor\% Processor Time\_Total and \\Processor\% User Time\_Total performance
counters. You can monitor them by launching Task Manager, and looking at the CPU utilization in the
"Performance" tab. (The red curve shows the % Procesor time, and the green one -- the %User time). Stop or
pause all CPU-intensive applications (WinAMP, MediaPlayer, etc.) and start monitoring the CPU utilization.
You have noticed that the counter values stay almost constant, right? Now, close Task Manager, wait about 5
seconds and start it again. You should notice a big peak in the CPU utilization. In several seconds, the peak
vanishes. Now, if we were reporting performance counters values instantly (as we get each counter sample),
one could think that our machine was extremely busy (almost 100%) at that moment, right? That's why we're
not going to report instant values, but we will collect several samples of the counter's values and will report
their average. That would be fair enough, don't you think? No?! I also don't, I was just checking you:) What
about available memory, I/O, etc. Because the CPU utilization is not enough for a realistic calculation of the
machine's workload, we should monitor more than one counter at a time, right? And because, let's say, the
current number of ASP.NET sessions is less important than the CPU utilization we will give each counter a
weight. Now the machine load will be calculated as the sum of the weighted averages of all monitored
performance counters. You should be guession already my idea for dynamic software load balancing.
However, a picture worths thousand words, and an ASCII one worths 2 thousand:) Here' is a real sample, and
the machine load calculation algorithm. In the example below, the machine load is calculated by monitoring 4
performance counters, each configured to collect its next sample value at equal intervals, and all counters
collect the same number of samples (this would be your usual case):

+-----------+ +-----------+ +-----------+ +-----------+


|% Proc Time| |% User Time| |ASP Req.Ex.| |% Disk Time|
+-----------+ +-----------+ +-----------+ +-----------+
|Weight 0.4| |Weight 0.3| |Weight 0.2| |Weight 0.5|
+-----------+ +-----------+ +-----------+ +-----------+
| 16| | 55| | 11| | 15|
| 22| | 20| | 3| | 7|
| 8| | 32| | 44| | 4|
| 11| | 15| | 16| | 21|
| 18| | 38| | 21| | 3|
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
| Sum | 75| | Sum | 160| | Sum | 95| | Sum | 50|
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
| Avg | 15| | Avg | 32| | Avg | 19| | Avg | 10|
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+
| WA | 6.0| | WA | 9.6| | WA | 3.8| | WA | 5.0|
+-----+-----+ +-----+-----+ +-----+-----+ +-----+-----+

Legend:

Sum
the sum of all counter samples
Avg
the average of all counter samples (Sum/Count)
WA
the weighted average of all counter samples (Sum/Count * Weight)
% Proc Time
(Processor\% Processor Time\_Total), the percentage of elapsed time that the processor spends to
execute a non-Idle thread. It is calculated by measuring the duration of the idle thread is active in the
sample interval, and subtracting that time from interval duration. (Each processor has an idle thread
that consumes cycles when no other threads are ready to run). This counter is the primary indicator of
processor activity, and displays the average percentage of busy time observed during the sample
interval. It is calculated by monitoring the time that the service is inactive, and subtracting that value
from 100%
% User Time
(Processor\% User Time\_Total) is the percentage of elapsed time the processor spends in the user
mode. User mode is a restricted processing mode designed for applications, environment subsystems,
and integral subsystems. The alternative, privileged mode, is designed for operating system
components and allows direct access to hardware and all memory. The operating system switches
application threads to privileged mode to access operating system services. This counter displays the
average busy time as a percentage of the sample time
ASP Req.Ex.
(ASP.NET Applications\Requests Executing\__Total__) is the number of requests currently executing
% Disk Time
(Logical Disk\% Disk Time\_Total) is the percentage of elapsed time that the selected disk drive was
busy servicing read or write requests

Sum (% Proc Time) = 16 + 22 + 8 + 11 + 18 = 75


Average (% Proc Time) = 75 / 5 = 15
Weighted Average (% Proc Time) = 15 * 0.4 = 6.0
...
MachineLoad = Sum (WeightedAverage (EachCounter))
MachineLoad = 6.0 + 9.6 + 3.8 + 5.0 = 24.4

Architecture and Implementation

I wondered about half a day how to explain the architecture to you. Not that it is so
complex, but because it would take too much space in the article, and I wanted to
show you some code, not a technical specification or even a DSS. So I wondered
whether to explain the architecture using a "top-to-bottom" or "bottom-to-top"
approach, or should I think out something else? Finally, as most of you have
already guessed, I decided to explain it in my own mixed way:) First, you should
learn of which assemblies is the solution comprised of, and then you could read
about their collaboration, the types they contain and so on... And even before that, I
recommend you to read and understand two terms, I've used throughout the article
(and the source code's comments).

Machine Load
the overall workload (utilization) of a machine - in our case, this is the sum of the weighted averages of
all performance counters (monitored for load balancing); if you've skipped the section "My Idea for
Dynamic Software Load Balancing", you may want to go back and read it
Fastest machine
the machine with the least current load

Architecture Outlook

First, I'd like to appologize about the "diagrams". I can work with only two
software products that can draw the diagrams, I needed in this article. I can't
afford the first (and my company is not willing to pay for it too:), and the
second bedeviled me so much, that I dropped out of the article one UML
static structure diagram, a UML deployment diagram and a couple of activity
diagrams (and they were nearly complete). I won't tell you the name of the
product, because I like very much the company that developed it. Just
accept my appologies, and the pseudo-ASCII art, which replaced the
original diagrams. Sorry:)

The load balancing software comes in three parts: a server, that reports the
load of the machine it is running on; a server that collects such loads, no
matter which machine they come from; and a library which asks the
collecting server which is the least loaded (fastest) machine. The server that
reports the machine's load is called "Machine Load Reporting Server"
(MLRS), and the server, that collects machine loads is called "Machine
Load Monitoring Server" (MLMS). The library's name "Load Balancing
Library" (LBL). You can deploy these three parts of the software as you like.
For example, you could install all of them on all machines.

The MLRS server on each machine joins a special, designated for the
purpose of the load balancing, multicasts group, and sends messages,
containing the machine's load to the group's multicast IP address. Because
all MLMS servers join the same group at startup, they all receive each
machine load, so if you run both MLRS and MLMS servers on all machines,
they will know each other's load. So what? We have the machine loads, but
what do we do with them? Well, all MLMS servers store the machine loads
in a special data structure, which lets them quickly retrieve the least
machine load at any time. So all machines now know which is the fastest
one. Who cares? We haven't really used that information to balance any
load, right? How do we query MLMS servers which is the fastest machine?
The answer is that each MLMS registers a special singleton object with
the .NET Remoting runtime, so the LBL can create (or get) an instance of
that object, and ask it for the least loaded machine. The problem is that LBL
cannot ask simultaneously all machines about this (yet, but I'm thinking on
this issue), so it should choose one machine (of course, it could be the
machine, it is running on) and will hand that load to the client application
that needs the information to perform whatever load balancing activity is
suitable. As you will later see, I've used LBL in a web application to
distribute the workload between all web servers in web farm. Below is a
"diagram" which depicts in general the collaboration between the servers
and the library:

+-----+ ______ +-----+


| A | __/ \__ | B |
+-----+ __/ \__ +-----+
+-->| LMS |<--/ Multicast \-->| LMS |<--+
| | | / \ | | |
| | LRS |-->\__ Group __/ | | |
| | | \__ __/ | | |
|<--| LBL | ^ \______/ | LBL |---+
| +-----+ | +-----+
| | +-----+
| | | C |
| | +-----+
| | | |
| | | |
| +--| LRS |
| Remoting | |
+--------------------| LBL |
+-----+

MLMS, MLRS and LBL Communication

Note: You should see the strange figure between the machines as a cloud,
i.e. it represents a LAN :) And one more thing -- if you don't understand
what multicasting is, don't worry, it is explained later in
the Collaboration section.

Now look at the "diagram" again. Let me remind you that when a machine
joins a multicast group, it receives all messages sent to that group, including
the messages, that the machine has sent. Machine A receives its own load,
and the load, reported by C. Machine B receives the load of A and C (it
does not report its load, because there's no MLRS server installed on it).
Machine C does not receive anything, because it has not MLMS server
installed. Because the machine C's LBL should connect (via Remoting) to
an MLMS server, and it has no such server installed, it could connect to
machine A or B and query the remoted object for the fastest machine. On
the "diagram" above, the LBL of A and C communicate with the remoted
object on machine A, while the LBL of B communicates with the remoted
object on its machine. As you will later see in the Configuration section,
there are very few things that are hardcoded in the solution's source code,
so don't worry -- you will be able to tune almost everything.

Assemblies & Types

The solution consists of 8 assemblies, but only three of them are of some
interest to us now: MLMS, MLRS, and LBL, located respectively in two
console applications
(MachineLoadMonitoringServer.exe and MachineLoadReportingServer.exe)
and one dynamic link library (LoadBalancingLibrary.dll). Surprisingly, MLMS
and MLRS do not contain any types. However, they use several types get
their job done. You may wonder why I have designed them in that way. Why
hadn't I just implemented both servers directly in the executables. Well, the
answer is quite simple and reflects both my strenghts and weaknesses as a
developer. If you have the time to read about it, go ahead, otherwise click
here to skip the slight detour.

GUI programming is what I hate (though I've written a bunch of GUI apps).
For me, it is a mundane work, more suitable for a designer than for a
developer. I love to build complex "things". Server-side applications are my
favorite ones. Multi-threaded, asynchronous programming -- that's the
"stuff" I love. Rare applications, that nobody "sees" except for a few
administrators, which configure and/or control them using some sort of
administration consoles. If these applications work as expected the end-
user will almost never know s/he is using them (e.g. in most cases, a user
browsing a web site does not realize that an IIS or Apache server is
processing her requests and is serving the content). Now, I've written
several Windows C++ services in the past, and I've written some .NET
Windows services recently, so I could easily convert MLMS and MLRS to
one of these. On the other hand I love console (CUI) applications so much,
and I like seing hundreds of tracing messages on the console, so I left
MLMS and MLRS in their CUI form for two reasons. The first reason is that
you can quickly see what's wrong, when something goes wrong (and it will,
at least once:), and the second one is because I haven't debugged .NET
Windows services (and because I have debugged C++ Windows services, I
can assure you that it's not "piece of cake"). Nevertheless, one can easily
convert both CUI applications in Windows services in less than half an hour.
I haven't implemented the server classes into the executables to make it
easier for the guy who would convert them into Windows services. S/he'll
need to write just 4 lines of code in the Window Service class's to get the
job done:

1. declare the server member variable:


C++ / CLI
LoadXxxServer __gc* server;
2. instantiate and start it in the overridden OnStart method:
C++ / CLI
server = new LoadXxxServer ();
server->Start ();
3. stop it in the overriden OnStop method:
C++ / CLI
server->Stop ();

Xxx is either Monitoring or Reporting. I'm sure you understand me now why
I have implemented the servers' code in separate classes in separate
libraries, and not directly in the executables.

I mentioned above that the solution consists of 8 assemblies, but as you


remember, 2 of them (the CUIs) did not contain any types, one of them was
LBL, so what are the other 5? MLMS and MLRS use respectively the types,
contained in the libraries LoadMonitoringLibrary (LML)
and LoadReportingLibrary (LRL). On the other hand, they and LBL use
common types, shared in an assembly, named SharedLibrary (SL). So the
assemblies are now MLMS + MLRS + LML + LRL + LBL + SL = 6. The 7th
is a simple CUI (not interesting) application, I used to test the load
balancing, so I'll skip it. The last assembly, is the web application that
demonstrates the load balancing in action. Below is a list of the four most
important assemblies that contain the types and logic for the implementation
of the load balancing solution.

SharedLibary (SL) - contains common and helper types, used by LML, LRL
and/or LBL. A list of the types (explained further) follows:

 ServerStatus - enumeration, used by LML and LRL's LoadXxxServer


classes
 WorkerDoneEventHandler - delegate, ditto
 Configurator - utility class (I'll discuss later), ditto
 CounterInfo - "struct" class, used by LRL and SL
 ILoadBalancer - interface, implemented in LML and used by LBL
 IpHelper - utility class, used by LML and LRL
 MachineLoad - "struct" class (with MarshalByValue semantics for the needs
of the Remoting runtime), used by LML, LRL and LBL
 Tracer - utility class, which most classes in LML and LRL inherit in order to
trace in the console in a consistent manner

NOTE: CounterInfo is not exactly what C++ developers call a "struct" class,
because it does a lot of work behind the scenes. Its implementation is non-
trivial and includes topics like timers, synchronization, and performance
counters monitoring; look at the Some Implementation Details section for
more information about it.

LoadMonitoringLibrary (LML) - contains the LoadMonitoringServer (LMS)


class, used directly by MLMS, as well as all classes, used internally in the
LMS class. List of LML's types (explained further) follows:

 LoadMonitoringServer - (LMS) class, the MLMS core


 MachineLoadsCollection - a simulation of a priority queue that stores the
machines' loads in a sorted manner, so it could quickly return the least
loaded machine (its implementation is more interesting than its name)
 LoadMapping - "struct" class, used internally by MachineLoadsCollection
 CollectorWorker - utility class, its only (public) method is the worker thread
that accepts and collects machine load reports
 ReporterWorker - utility class, its only (public) method is the worker thread
that accepts LBL requests and reports machine loads
 WorkerTcpState - "struct" class, used internally by the CollectorWorker
 WorkerUdpState - "struct" class, used internally by the ReporterWorker
 ServerLoadBalancer - a special Remoting-enabled (MarshalByRefObject)
class, which is registered for remoting as a Singleton, and activated on the
server side by LBL to service its requests

NOTE: I used the ReporterWorker to implement the first version of LBL in


some faster, more lame way, but I've dropped it later; now, LMS registers a
Singleton object for the LBL requests; however, LMS is still using (the fully
functional) ReporterWorker class, so one could build another kind of LBL
that connects to an MLMS and asks for the least loaded machine using a
simple TCP socket (I'm sorry that I've overwritten the old LBL library).

LoadReportingLibrary (LRL) - contains the LoadReportingServer (LRS)


class, used directly by MLRS, as well as all classes, used internally in the
LRS class. List of LRL's types (explained further) follows:

 LoadReportingServer - class, the MLRS core


 ReportingWorker - utility class, its only (public) method is the worker thread
that starts the monitoring of the performance counters and periodically
reports to one or more MLMS the local machine's load

LoadBalancingLibrary (LBL) - contains just one class, ClientLoadBalancer,


which is instantiated by client applications; the class contains only one
(public) method, which is "surprisingly"
named GetLeastMachineLoad returning the least loaded machine:) LBL
connects to LML's ServerLoadBalancer singleton object via Remoting. For
more details, read the following section.

Collaboration

In order to understand how the objects "talk" to each other within an


assembly and between assemblies (and on different machines), you should
understand some technical terms. Because they amount to about a page,
and maybe most of you do know what they mean, here's what I'll do: I'll give
you a list of the terms, and if you know them, click here to read about the
collaboration, otherwise, keep reading... The terms are: delegate, worker,
TCP, UDP, (IP) Multicasting, and Remoting.
Delegate
a secure, type-safe way to call a method of a class indirectly, using a "reference" to that method; very
similar to and at the same time quite different from C/C++ function pointers (callbacks);
Worker
utility class, usually with just one method, which is started as a separate thread; the class holds the
data (is the state), needed by the thread to do its job;
TCP
a connection-based, stream-oriented delivery protocol with end-to-end error detection and correction.
Connection-based means that a communication session between hosts is established before
exchanging data. A host is any device on a TCP/IP network identified by a logical IP address. TCP
provides reliable data delivery and ease of use. Specifically, TCP notifies the sender of packet delivery,
guarantees that packets are delivered in the same order in which they were sent, retransmits lost
packets, and ensures that data packets are not duplicated;
UDP
a connectionless, unreliable transport protocol. Connectionless means that a communication session
between hosts is not established before exchanging data. UDP is often used for one-to-many
communications that use broadcast or multicast IP datagrams. The UDP connectionless datagram
delivery service is unreliable because it does not guarantee data packet delivery and no notification is
sent if a packet is not delivered. Also, UDP does not guarantee that packets are delivered in the same
order in which they were sent. Because delivery of UDP datagrams is not guaranteed, applications
using UDP must supply their own mechanisms for reliability, if needed. Although UDP appears to have
some limitations, it is useful in certain situations. For example, Winsock IP multicasting is implemented
with UDP datagram type sockets. UDP is very efficient because of low overhead. Microsoft networking
uses UDP for logon, browsing, and name resolution;
Multicasting
technology that allows data to be sent from one host and then replicated to many others without
creating a network traffic nightmare. This technology was developed as an alternative to broadcasting,
which can negatively impact network bandwidth if used extensively. Multicast data is replicated to a
network only if processes running on workstations in that network are interested in that data. Not all
protocols support the notion of multicasting -- on Win32 platforms, only two protocols are capable of
supporting multicast traffic: IP and ATM;
IP Multicasting
IP multicasting relies on a special group of addresses known as multicast addresses. It is this group
address that names a given group. For example, if five machines all want to communicate with one
another via IP multicast, they all join the same group address. Once they are joined, any data sent by
one machine is replicated to every member of the group, including the machine that sent the data. A
multicast IP address is a class D IP address in the range 224.0.0.0 through 239.255.255.255
Remoting
the process of communication between different operating system processes, regardless of whether
they are on the same computer. The .NET remoting system is an architecture designed to simplify
communication between objects living in different application domains, whether on the same computer
or not, and between different contexts, whether in the same application domain or not.

I'll start from inside-out, i.e. I'll first explain how


various classes communicate with each other
with the assemblies, and then I'll explain how the
assemblies collaborate between them.

In-Assembly collaboration (a thread


synchronization how-to:)
When
the LoadReportingServer and LoadMonitoringSe
rver classes are instantiated, and
their Start methods are called, they launch
respectively one or two threads to do their job
asynchronously (and to be able to respond to
"Stop" commands, of course). Well, if starting a
thread is very easy, controlling it is not that easy.
For example, when the servers should stop, they
should notify the threads that they are about to
stop, so the threads could finish their job and exit
appropriately. On the other hand, when the
servers launch the threads, they should be
notified when the threads are about to enter their
thread loops and have executed their
initialization code. In the next couple of
paragraphs I'll explain how I've solved these
synchronization issues, and if you know a cooler
way, let me know (with the message board
below). In the paragraphs below, I'll refer to the
instances of
the LoadReportingServer and LoadMonitoringSe
rver classes as "(the) server".

When the Start method is executed, the LMS


object creates a worker class instance, passing it
a reference to itself (this), a reference to a
delegate and some other useful variables that
are not interesting for this section. The server
object then creates an AutoResetEvent object in
a unsignalled state. Then the LMS object starts a
new thread, passing for the ThreadStart delegate
the address of a method in the worker class. (I
call a worker class' method, launched as a
thread a worker thread.) After the thread has
been started, the server object blocks, waiting
(infinitely) for the event object to be signalled.
Now, when the thread's initialization code
completes, it calls back the server via the server-
supplied delegate, passing a boolean parameter
showing whether its initialization code executed
successfully or something went wrong. The
target method of the delegate in the server class
sets (puts in signalled state)
the AutoResetEvent object and records in a
private boolean member the result of the thread
initialization. Setting the event object unblocks
the server: it now knows that the thread's startup
code has completed, and also knows the result
of the thread's initialization. If the thread did not
manage to initialize successfully, it has already
exited, and the server just stops. If the thread
succeeded to initialize, then it enters its thread
loop and waits for the server to inform it when it
should exit the loop (i.e. the server is stopping).
One could argue that this "worker thread-to-main
thread" synchronization looks too complicated
and he might be right. If we only needed to know
that the thread has finished the initialization code
(and don't care if it initialized successfully) we
could directly pass the worker a reference to
the AutoResetEvent object, and the thread would
then set it to a signalled state, but you saw that
we need to know whether the thread has
initialized successfully or not.

Now that was the more complex part. The only


issue we have to solve now is how to stop the
thread, i.e. make it exit its thread loop. Well,
that's what I call a piece of cake. If you
remember, the server has passed a reference to
itself (this) to the worker. The server has
a Status property, which is an enumeration,
describing the state of the server (Starting,
Started, Stopping, Stopped). Because the thread
has a reference to the server, in its thread loop it
checks (by invoking the Status property) whether
the server is not about to stop (Status ==
ServerStatus::Stopping). If the server is stopping,
so is the thread, i.e. the thread exits silently and
everything's OK. So when the server is
requested to stop, it modifies its private member
variable status to Stopping and Joins the thread
(waits for the thread to exit) for a configured
interval of time. If the thread exits in the specified
amount of time, the server changes its status
to Stopped and we're done. However, a thread
may timeout while processing a request, so the
server then aborts the thread by calling the
thread's Abort method. I've written the thread
loops in try...catch...finally blocks and in
their catch clause, the threads check whether
they die a violent death:), i.e.
a ThreadAbortException was raised by the
server. The thread then executes its cleanup
code and exits. (And I thought that was easier to
explain:)

That much about how the server classes talk to


worker classes (main thread to worker threads).
The rest of the objects in the assemblies
communicate using references to each other or
via delegates. Now comes the part that explains
how the assemblies "talk" to each other, i.e. how
the MLRS sends its machine's load to the MLMS,
and how LBL gets the minimum machine load
from MLMS.

Cross-machine assembly collaboration

I'll first "talk" how the MLRS reports the machine


load to MLMS. To save some space in the article
(some of your bandwidth, and some typing for
me:), I'll refer to the LoadReportingServer class
as LRS and to the LoadMonitoringServer class
as LMS. Do not confuse them with the server
applications, having an "M" prefix.

LMS starts two worker threads. One for


collecting machine loads, and one for reporting
the minimum load to interested clients. The
former is named CollectorWorker, and the latter
-- ReporterWorker. I've mentioned somewhere
above that the ReporterWorker is not so
interesting, so I'll talk only about
the CollectorWorker. In the paragraphs below, I'll
call it simply a collector. When the collector
thread is started, it creates a UDP socket, binds
it locally and adds it to a multicast group. That's
the collector's initialization code. It then enters a
thread loop, periodically polling the socket for
arrived requests. When a request comes, the
collector reads the incomming data, parses it,
validates it and if it is a valid machine load report,
it enters the load in the machine loads
collection of the LMS class. That's pretty much
everything you need to know about how MLMS
accepts machine loads from MLRS.

LRS starts one thread for reporting the current


machine load. The worker's name
is ReportingWorker, and I'll refer to it as reporter.
The initialization code of this thread is to start
monitoring the performance counters, create a
UDP socket and make it a member of the same
multicast group that MLMS's collector object has
joined. In its thread loop, the reporter waits a
predefined amount of time, then gets the current
machine load and sends it to the multicast
endpoint. A network device, called a "switch"
then dispatches the load to all machines that
have joined the multicast group, i.e. all MLMS
collectors will receive the load, including the
MLMS that runs on the MLRS machine (if a
MLMS has been installed and running there).

Here comes the most interesting part -- how LBL


queries which is the machine with the least load
(the fastest machine). Well, it is quite simple and
requires only basic knowledge about .NET
Remoting. If you don't understand Remoting, but
you do understand DCOM, assume that .NET
Remoting compared to DCOM is what C++ is
compared to C. You'll be quite close and at the
same time quite far from what Remoting really is,
but you'll get the idea. (In fact, I've read several
books on DCOM, and some of them refered to it
as "COM Remoting Infrastructure"). When MLMS
starts, it registers a class
named ServerLoadBalancer with the Remoting
runtime as a singleton (an object that is
instantiated just once, and further requests for its
creation end up getting a reference to the same,
previously instantiated object). When a request
to get the fastest machine comes
(GetLeastMachineLoad method gets called) the
singleton asks
the MachineLoadsCollection object to return its
least load, and then hands it to the client object
that made the remoted call.

Below is a story you would like to hear about


remoted objects that need to have parameter-
less constructors. If you'd like to skip the
story, click here, otherwise enjoy...

Now that all of you know that an object may be


registered for remoting, probably not many of
you know that you do not have easy control over
the object's instantiation. Which means that you
don't create an instance of the singleton object
and register it with the Remoting runtime, but
rather the Remoting runtime creates that object
when it receives the first request for the object's
creation. Now, all server-activated objects must
have a parameter-less constructor, and the
singleton is not an exception. But we want to
pass our ServerLoadBalancer class a reference
to the machine loads collection. I see only two
ways to do that -- the first one is to register the
object with the Remoting runtime, create an
instance of it via Remoting and call an "internal"
method Initialize, passing the machine loads
collection to it. At first that sounded like a good
idea and I did it just like that. Then I launched the
client testing application first, and the server after
it. Can you guess what happened? The client
managed to create the singleton first, and it was
not initialized -- boom!!! Not what we expected,
right? So I thought a bit how to find a
workaround. Luckily, it occured to me how to
hack this problem. I decided to make a static
member of the LoadMonitoringServer class,
which would hold the machine loads collection.
At the begining it would be a null reference, then
when the server starts, I would set it to the
server's machine loads collection. Now when our
"parameter-less constructed" singleton object is
instantiated for the first time by the Remoting
runtime, it would get the machine loads via
the LoadMonitoringServer::StaticMachineLoads
member variable and the whole problem has
disappeared. I had to only mark the static
member variable as (private public) so it is visible
only within the assembly. I know my appoach is
a hack, and if you know a better pattern that
solves my problem, I'll be happy to learn it.

Here's another interesting issue. How does the


client (LBL) compile against the
remoted ServerLoadBalancer class? Should it
have a reference (#using "...dll") to LML or what?
Well, there is a solution to this problem, and I
haven't invented it, thought I'd like much:) I
mentioned before, that the SharedLibrary has
some shared types, used by LBL, LMS and LRS.
No, it's not what you're thinking! I haven't put
the ServerLoadBalancer class there even if I
wanted to, because it requires
the MachineLoadsCollection class, and the latter
is located in LML. What I consider an elegant
solution, (and what I did) is defining an interface
in the SharedLibrary, which I implemented in
the ServerLoadBalancer class in LML. LBL tries
to create the ServerLoadBalancer via Remoting,
but it does not explicitly try to create
a ServerLoadBalancer instance, but an instance,
implementing the ILoadBalancer interface. That's
how it works. LBL creates/activates the singleton
on the LMS side via Remoting and calls
its GetLeastMachineLoad method to determine
the fastest machine.

Some Implementation Details

Below is a list of helper classes that are cool,


reusable or worth to mention. I'll try to explain
their cool sides, but you should definitely peek at
the source code to see them:)

Configurator

I like the .NET configuration classes very much,


and I hate reinventing the wheel, but this class is
a specific configuration class for this solution and
is cooler than .NET configuration class in at least
one point. What makes the class cooler, is that it
can notify certain objects when the configuration
changes, i.e. the underlying configuration file has
been modified with some text editor. So I've built
my own configurator class, which uses the
FileSystemWatcher class to sniff for writes in the
configuration file, and when the file changes, the
configurator object re-loads the file, and raises
an event to all subscribers that need to know
about the change. These subscribers are only
two, and they are the Load Monitoring and
Reporting servers. When they receive the event,
they restart themselves, so they can reflect the
latest changes immediately.
CounterInfo

I used to call this class a "struct" one. I wasn't fair


to it :), as it is one of the most important classes
in the solution. It wraps a PerformanceCounter
object in it, retrieves some sample values, and
stores them in a cyclic queue. What is a cyclic
queue? Well, I guess there's no such animal :)
but as I have "invented" it, let's explain you what
it is. It is a simple queue with finite number of
elements allowed to be added. When the queue
"overflows", it pops up the first element, and
pushes the new element into the queue. Here's
an example of storing the numbers from 1 to 7 in
a 5-element cyclic queue:

Plain Text
Pass Queue Running Total (Sum)
---- ----- -------------------
[] = 0
1 [1] =0+1 = 1
2 [2 1] =1+2 = 3
3 [3 2 1] =3+3 = 6
4 [4 3 2 1] =6+4 = 10
5 [5 4 3 2 1] = 10 + 5 = 15
6 [6 5 4 3 2] = 15 - 1 + 6 = 20
7 [7 6 5 4 3] = 20 - 2 + 7 = 25

Why do I need the cyclic queue? To have a


limited state of each monitored performance
counter, of course. If pass 5 was the state of the
counter 3 seconds ago, the its average was 15/5
= 3, and if now we are at pass 7, the counter's
average is 20/5 = 4. Sounds reallistic, doesn't it?
So we use the cyclic queue to store the transitory
counter samples and know its average for the
past N samples which were measured for the
past M seconds. You see how easy is calculated
the running sum. Now the only thing a counter
should do to tell its average is ro divide the
running sum to the number of sample values it
had collected. You know that the machine load is
the sum of the weigted averages of all monitored
performance counters for the given machine. But
you might ask, what happens in the following
situation:

We have two machines: A and B. Both are


measuring just one counter, their CPU utilization.
A Machine Load Monitoring Server is running on
a third machine C, and a load balancing client is
on fourth machine D. A and B's Load Reporting
Servers are just started. Their CounterInfo
classes have recorded respectively 50 and 100
(because the administrator on machine B has
just launched IE:). A and B are configured to
report each second, but they should report the
weighted averages of 5 sample values. 1 second
elapses, but both A and B has collected only 1
sample value. Now D asks C which is the least
loaded machine. Which one should be reported?
A or B? The answer is simple: None. No
machine is allowed to report its load unless it has
collected the necessary number of sample
values for all performance counters. That means,
that unless A and B has filled up their cyclic
queues for the very first time, they block and
don't return their weighted average to the caller
(the LRS's reporter worker).

MachineLoadsCollection

This class is probably more tricky than


interesting. Generally, it is used to store the
loads, that one or more LRS report to LMS.
That's the class' dumb side. One cool side of the
class is that it stores the loads in 3 different data
structures to simulate one, that is missing in
the .NET BCL - a priority queue that can store
multiple elements with the same key, or in STL
terms, something like

std::priority_queue <std::vector <X *>, ... >

I know that C++ die-hards know it by heart, but


for the rest of the audience: std::priority_queue is
a template container adaptor class that provides
a restriction of functionality limiting access to the
top element of some underlying container type,
which is always the largest or of the highest
priority. New elements can be added to
the priority_queue and the top element of
the priority_queue can be inspected or removed.
I took the definition from MSDN, but I'd like to
correct it a little bit: you should read "which is
always the largest or of the highest priority" as
"which is always what the less functor returns as
largest or of the highest priority". At the
beginning, I thought to use
the priority_queue template class, and put
there "gcroot"-ed references, but then I thought
that it would be more confusing and difficult than
helping me, and you, the reader. Do you know
what the "gcroot" template does? No?
Nevermind then:) In .NET BCL classes, we have
something which is very similiar to a priority
queue -- that's the SortedList class
in System::Collections. Because it can store
any Object-based instances, we could
put ArrayList references in it to simulate a priority
queue that stores multiple elements with the
same key. There's also a Hashtable to help us
solve certain problems, but we'll get to it in a
minute. Meanwhile, keep reading to understand
why I need these data structures in the first
place.

Machine loads do not enter the machine loads


collection by name, i.e. they are added to the
loads collection with the key, being the machine's
load. That's why before each machine reports its
load, it converts the latter to unsigned long and
then transmits it over the wire to LMS. It helps
restricting the number of stored loads, e.g. if
machine A has a load of 20.1, and machine B
has a load of 20.2, then the collection considers
the loads as equal. When LMS "gets" the load, it
adds it in the SortedList, i.e. if we have three
machines -- "A", "B", and "C" with loads 40, 20
and 30, then the SortedList looks like:

Plain Text
[C:30][A:40]

If anyone asks for the fastest machine, we


always return the 1st positional element in the
sorted list, (because it is sorted in ascending
order).

Well, I'd like it to be so simple, but it isn't. What


happens when a 4th machine, "D" reports a load
20? You should have guessed by now why I
need to store an ArrayList for each load, so here
is it in action -- it stored the loads of machines B
and D:

Plain Text
[D:20]
[C:30][A:40]

Now, if anyone asks for the fastest machine, we


will return the first element of the ArrayList, that
is stored in the first element of the SortedList,
right? It is machine "B".

But then what happens when machine "B"


reports another load, equal to 40? Shall we leave
the first reported load? Of course, not!
Otherwise, we will return "B" as the fastest
machine, where "D" would be the one with the
least load. So we should remove machine "B"'s
older load from the first ArrayList and insert its
new load, wherever is appropriate. Here's the
data structure then:
Plain Text

[D:20][C:30][A:40]

Now how did you find machine "B"'s older load in


order to remove it? Eh? I guess with your eyes.
Here's where we need that Hashtable I
mentioned above. It is a mapping between a
machine's older load and the list it resides in
currently. So when we add a machine load, we
first check whether the machine has reported a
load before, and if it did, we find the ArrayList,
where the old load was placed, remove it from
the list, and add the new load to a new list, right?
Wrong. We have one more thing to do, but first
let me show you the problem, and you'll guess
what else I've done to make the collection work
as expected.

Imagine that machine "D" reports a new load --


45. Now you'll say that the data now looks like
the one below:

Plain Text

[C:30][A:40][D:45]

You wish it looks like this! But that's because I


made a mistake when I was trying to visualize
the first loads. Actually the previous loads
collection looked like this:

Plain Text
^
|
M
A
C . . . .
H . . . .
I . . . .
N . . . .
E . . B .
S D C A .

LOAD 20 30 40 . . . -->

So you now will agree, that the collection actually


looks like this:

Plain Text
^
|
M
A
C . . . .
H . . . .
I . . . .
N . . . .
E . . B .
S . C A D

LOAD 20 30 40 45 . . -->

Yes, the first list is empty, and when a request to


find the least loaded machine comes and you try
to pop up the first element of the ArrayList for
load 20 (which is the least load), you'll
get IndexOutOfRangeException, as I got it a
couple of times before I debugged to understand
what was happenning. So when we remove an
old load from an ArrayList, we should check
whether it has orphaned (is now empty), and if
this is the case, we should remove
the ArrayList from the SortedList as well.

Here's the code for the Add method:

C++ / CLI
Shrink ▲
void MachineLoadsCollection::Add
(MachineLoad __gc* machineLoad)
{
DEBUG_ASSERT (0 != machineLoad);
if (0 == machineLoad)
return;

String __gc* name = machineLoad->Name;


double load = machineLoad->Load;
Object __gc* boxedLoad = __box (load);

rwLock->AcquireWriterLock (Timeout::Infinite);

// a list of all machines that have reported this


particular

// load value

//

ArrayList __gc* loadList = 0;

// check whether any machine has reported


such a load

//
if (!loads->ContainsKey (boxedLoad))
{
// no, this is the first load with this value -
create new list

// and add the list to the main loads (sorted)


list

//

loadList = new ArrayList ();


loads->Add (boxedLoad, loadList);
}
else
{
// yes, one or more machines reported the
same load already

//

loadList = static_cast<ArrayList __gc*>


(loads->get_Item (boxedLoad));
}

// check if this machine has already reported a


load previously

//

if (!mappings->ContainsKey (name))
{
// no, the machine is reporting for the first
time

// insert the element and add the machine to


the mappings

//

loadList->Add (machineLoad);
mappings->Add (name, new LoadMapping
(machineLoad, loadList));
}
else
{
// yes, the machine has reported its load
before

// we should remove the old load; get its


mapping

//

LoadMapping __gc* mappedLoad =


static_cast<LoadMapping __gc*>
(mappings->get_Item (name));
// get the old load, and the list we should
remove it from

//

MachineLoad __gc* oldLoad =


mappedLoad->Load;
ArrayList __gc* oldList = mappedLoad-
>LoadList;

// remove the old mapping

//

mappings->Remove (name);

// remove the old load from the old list

//

int index = oldList->IndexOf (oldLoad);


oldList->RemoveAt (index);

// insert the new load into the new list

//

loadList->Add (machineLoad);

// update the mappings

//

mappings->Add (name, new LoadMapping


(machineLoad, loadList));

// finally, check if the old load list is totally


empty

// and if so, remove it from the main (sorted)


list

//

if (oldList->Count == 0)
loads->Remove (__box (oldLoad-
>Load));
}

rwLock->ReleaseWriterLock ();
}

Now, for the curious, here's


the get_MinimumLoad property's code:
C++ / CLI
Shrink ▲
MachineLoad __gc*
MachineLoadsCollection::get_MinimumLoad ()
{
MachineLoad __gc* load = 0;

rwLock->AcquireReaderLock
(Timeout::Infinite);

// if the collection is empty, no machine has


reported

// its machineLoad, so we return "null"

//

if (loads->Count > 0)
{
// the 1st element should contain one of the
least

// loaded machines -- they all have the


same load

// in this list

//

ArrayList __gc* minLoadedMachines =


static_cast<ArrayList __gc*> (loads-
>GetByIndex (0));
load = static_cast<MachineLoad __gc*>
(minLoadedMachines->get_Item (0));
}

rwLock->ReleaseReaderLock ();

return (load);
}

Well, that's prety much about how


the MachineLoadsCollection class works in order
to store the machine loads, and return the least
loaded machine. Now we will see what else is
cool about this class. I called it the Grim Reaper,
and that's what it is -- a method,
named GrimReaper (GR), that runs
asynchronously (using a Timer class) and kills
dead machines!:) Seriously, GR knows the
interval at which each machine, once reported a
load, should report it again. If a machine fails to
report its load in a timely manner it is removed
from the MachineLoadsCollection container. In
this way, we guarantee that a machine, that is
now dead (or is disconnected from the network)
will not be returned as the fastest machine, at
least not before it reports again (it is brought
back to the load balancing then). However, in
only about 30 lines of code, I managed to made
two mistakes in the GR code. The first one was
very lame -- I was trying to remove an element
from a hash table while I was iterating over its
elements, but the second was a real bich!
However, I found it quite quickly, because I love
console applications:) I was outputting a start (*)
when GR was executing, and a caret (^) when it
was killing a machine. I then observed that even
if the (only) machine was reporting regularly its
load, at some time, GR was killing it! I was
staring at the console at least for 3 minutes. The
GR code was simple, and I thought that there's
no chance to make a mistake there. I was wrong.
It occured to me that I wasn't considering the
fact, that the GR code takes some time to
execute. It was running fast enough, but it was
taking some interval of time. Well, during that
time, GR was locking the machine loads
collection. And while the collection was locked,
the collector worker was blocked, waiting for the
collection to be unlocked, so it can enter the
newly received load ther. So when the collection
was finally unlocked at the end of the GR code,
the collector entered the machine's load. You
can guess what happens when the GR is
configured to run in shorter intervals and the
machines report in longer intervals. GR locks,
and locks and locks, while the collector blocks,
and blocks and blocks, until a machine is
delayed by the GR itself. However, because GR
is oblivious to the outer world, it thinks that the
machine is a dead one, so it removes the
machine from the load balancing, until the next
time it reports a brand new load. My solution for
this issue? I have it in my head, but I'll implement
it in the next version of the article, because I
really ran out of time. (I couldn't post the article
for the November's contest, because I couldn't
finish this text in time. It looks that writing text in
plain English is more difficult than writing
Managed C++ code, and I don't want to miss
December's contest too:)

If anyone is interested, here is Grim Reaper's


code:

C++ / CLI
Shrink ▲
void MachineLoadsCollection::GrimReaper
(Object __gc* state)
{
// get the state we need to continue

//

MachineLoadsCollection __gc* mlc =


static_cast<MachineLoadsCollection __gc*>
(state);
// temporarily suspend the timer

//

mlc->grimReaper->Change (Timeout::Infinite,
Timeout::Infinite);
// check if we are forced to stop

//

if (!mlc->keepGrimReaperAlive)
return;
// get the rest of the fields to do our work

//

ReaderWriterLock __gc* rwLock = mlc-


>rwLock;
SortedList __gc* loads = mlc->loads;
Hashtable __gc* mappings = mlc->mappings;
int reportTimeout = mlc->reportTimeout;

rwLock->AcquireWriterLock (Timeout::Infinite);

// Bring out the dead :)

//

// enumerating via an IDictionaryEnumerator,


we can't delete

// elements from the hashtable mappings; so


we create a temporary

// list of machines for deletion, and delete them


after we have

// finished with the enumeration

//

StringCollection __gc* deadMachines = new


StringCollection ();

// walk the mappings to get all machines

//
DateTime dtNow = DateTime::Now;
IDictionaryEnumerator __gc* dic = mappings-
>GetEnumerator ();
while (dic->MoveNext ())
{
LoadMapping __gc* map =
static_cast<LoadMapping __gc*> (dic->Value);
// check whether the dead timeout has
expired for this machine

//

TimeSpan tsDifference = dtNow.Subtract


(map->LastReport);
double difference =
tsDifference.TotalMilliseconds;
if (difference > (double) reportTimeout)
{
// remove the machine from the data
structures; it is

// now considered dead and does not


participate anymore

// in the load balancing, unless it reports


its load

// at some later time

//

String __gc* name = map->Load->Name;

// get the old load, and the list we should


remove it from

//

MachineLoad __gc* oldLoad = map-


>Load;
ArrayList __gc* oldList = map->LoadList;

// remove the old mapping (only add it to


the deletion list)

//

deadMachines->Add (name);

// remove the old load from the old list

//

int index = oldList->IndexOf (oldLoad);


oldList->RemoveAt (index);
// finally, check if the old load list is totally
empty

// and if so, remove it from the main list

//

if (oldList->Count == 0)
loads->Remove (__box (oldLoad-
>Load));
}
}

// actually remove the dead machines from the


mappings

//

for (int i=0; i<deadMachines->Count; i++)


mappings->Remove (deadMachines-
>get_Item (i));

// cleanup

//

deadMachines->Clear ();

rwLock->ReleaseWriterLock ();

// resume the timer

//

mlc->grimReaper->Change (reportTimeout,
reportTimeout);
}
Load Balancing in Action - Balancing a Web
Farm

I've built a super simple .NET Web application (in


C#), that uses LBL to perform a load balancing in
a web farm. Though the application is very little,
it is interesting and deserves some space in this
article, so here we go. First, I've written a class
that wraps the load balancing
class ClientLoadBalancer from LBL, named
it Helper, and implemented it as a singleton so
the Global class of the web application and the
web page classes could see one instance of it.
Then I used it in the Session_OnStart method of
the Global class to redirect every new session's
first HTTP request to the most available
machine. Furthermore, in the sample web page,
I've used it again to dynamically build URLs for
further processing, replacing the local host again
with the fastest machine. Now one may argue
(and he might be right) that a user can spend a
lot of time reading that page, so when he
eventually clicks on the "faster" link, the
previously faster machine could not be the
fastest one at that time. Just don't forget that
hitting another machine's web application will
cause its Session_OnStart trigger again, so
anyway, the user will be redirected to the fastest
machine. Now, if you don't get what am I talking
about, that's because I haven't shown any code
yet. So here it its:

c#
Shrink ▲
protected void Session_Start (object sender,
EventArgs e)
{
// get the fastest machine from the load
balancer

//

string fastestMachineName =
Helper.Instance.GetFastestMachineName ();

// we should check whether the fastest


machine is not the machine,

// this web application is running on, as then


there'll be no sence

// to redirect the request

//

string thisMachineName =
Environment.MachineName;
if (String.Compare (thisMachineName,
fastestMachineName, false) != 0)
{
// it is another machine and we should
redirect the request

//

string fasterUrl =
Helper.Instance.ReplaceHostInUrl (
Request.Url.ToString (),
fastestMachineName);
Response.Redirect (fasterUrl);
}
}
And here's the code in the sample web page:

c#
private void OnPageLoad (object sender,
EventArgs e)
{
// get the fastest machine and generate the
new links with it

//

string fastestMachineName =
Helper.Instance.GetFastestMachineName ();
link.Text = String.Format (
"Next request will be processed by machine
'{0}'",
fastestMachineName);
// navigate to the same URL, but the host
being the fastest machine

//

link.NavigateUrl =
Helper.Instance.ReplaceHostInUrl (
Request.Url.ToString (),
fastestMachineName);
}

If you think that I hardcoded the settings in


the Helper class, you are wrong. First, I hate
hardcoded or magic values in my code (though
you may see some in an article like this).
Second, I was testing the solution on my
coleagues' computers, so writing several lines of
code in advance, helped me to avoid the
inevitable otherwise re-compilations. I just
deployed the web application there. Here's the
trivial C# code of the Helper class (note that I
have hardcoded the keys in Web.config file ;-)

c#
Shrink ▲
class Helper
{
private Helper ()
{
// assume failure(s)

//

loadBalancer = null;
try
{
NameValueCollection settings =
ConfigurationSettings.AppSettings;

// assume that MLMS is running on our


machine and the web app

// is configured to create its remoted


object using the defaults;

// if the user has configured another


machine in the Web.config

// running MLMS, try to get its settings


and create the remoted

// object on it

//

string machine =
Environment.MachineName;
int port = 14000;
RemotingProtocol protocol =
RemotingProtocol.TCP;

string machineName = settings


["LoadBalancingMachine"];
if (machineName != null)
machine = machineName;

string machinePort = settings


["LoadBalancingPort"];
if (machinePort != null)
{
try
{
port = int.Parse (machinePort);
}
catch (FormatException)
{
}
}

string machineProto = settings


["LoadBalancingProtocol"];
if (machineProto != null)
{
try
{
protocol = (RemotingProtocol)
Enum.Parse (
typeof (RemotingProtocol),
machineProto,
true);
}
catch (ArgumentException)
{
}
}

// create a proxy to the remoted object

//

loadBalancer = new ClientLoadBalancer (


machine,
protocol,
port);
}
catch (Exception e)
{
if (e is OutOfMemoryException || e is
ExecutionEngineException)
throw;
}
}

public string GetFastestMachineName ()


{
// assume that the load balancing could not
be created or it will fail

//

string fastestMachineName =
Environment.MachineName;
if (loadBalancer != null)
{
MachineLoad load =
loadBalancer.GetLeastMachineLoad ();
if (load != null)
fastestMachineName = load.Name;
}
return (fastestMachineName);
}

public string ReplaceHostInUrl (string url,


string newHost)
{
Uri uri = new Uri (url);
bool hasUserInfo = uri.UserInfo.Length > 0;
string credentials = hasUserInfo ?
uri.UserInfo : "";
string newUrl = String.Format (
"{0}{1}{2}{3}:{4}{5}",
uri.Scheme,
Uri.SchemeDelimiter,
credentials,
newHost,
uri.Port,
uri.PathAndQuery);
return (newUrl);
}

public static Helper Instance


{
get { return (instance); }
}

private ClientLoadBalancer loadBalancer;


private static Helper instance = new Helper ();
} // Helper

If you wonder how the servers look like when


running, and what a great look and feel I've
designed for the web application, here's a
screenshot to disappoint you:)

Building, Configuring and Deploying the Solution

There's a little trick you need to do, in order to


load the solution file. Open your IIS
administration console (Start/Run... type inetmgr)
and create a new virtual
directory LoadBalancingWebTest. when you're
asked about the folder, choose X:\Path\To\
SolutionFolder\LoadBalancingWebTest. You can
now open the solution file
(SoftwareLoadBalancing.sln) with no problems.
Load it in Visual Studio .NET, build
the SharedLibrary project first, as the others
depend on it, then build LML and LRS, and then
the whole solution. Note that the setup projects
won't build automatically so you should select
and build them manually.

Note: When you compile the solution, you will


get 15 warnings. All of them state: warning
C4935: assembly access specifier modified
from 'xxx', where xxx could be private or public. I
don't know how to make the compiler stop
complaining about this. There are no other
warnings at level 4. Sorry if these embarass you.

That's it if you have VS.NET. If you don't, you


can compile only the web application, as it is
written in C# and can be compiled with the free
C# compiler, coming with the .NET framework.
Otherwise, buy a copy of VS.NET, and become a
CodeProject (and Microsoft) supporter :) BTW, I
realized right now, that I should write my next
articles in C#, so the "poor" guys like me can
have some fun too. I'm sorry guys! I promise to
use C# in most of the next articles I attempt to
write.

Configuration

If you look at the Common.h header, located in


the SharedFiles folder in the solution, you'll
notice that I've copied and pasted the meaning of
all configuration keys from that file. However,
because I know you won't look at it until you've
liked the article (and it's high time do so, as it is
comming to its end:), here's the explanation of
the XML configuration file, and various macros
in Common.h.

What's so common in Common.h?

This header file is used by almost all projects in


the solution. It has several (helpful) macros I'm
about to discuss, so if you're in the mood to read
about them, go ahead. Otherwise, click here to
read only about the XML configuration file.

First, I'm going to discuss the .NET member


access modifiers. There are 5 of them, though
you may use only four of them, unless you are
writing IL code. Existing languages refer to them
in a different way, so I'll give you a comparison
table of their names and some explanations.
.NET C# MC++ Explanation
term keyword( keyword(
s) s)
private private private the member is
private visible only in the
class, it is defined
in, and is not
visible from other
assemblies; note
the double use of
the private keywo
rd in MC++ -- the
first one (in this
table) specifies
whether the
member is visible
from other
assemblies, and
the other
specifies whether
the member is
visible from other
classes within the
same assembly.
public public public visible from all
public assemblies and
classes
family protected public visible from all
protected assemblies, but
can be used only
from derived
classes
family internal private visible from all
and public classes withing
assembl the assembly, but
y not visible to
external
assemblies

Because I like most the C# keywords, I #defined


and used thoughout the code four macros to
avoid typing the double keywords in MC++:

#define PUBLIC public public


#define PRIVATE private private
#define PROTECTED public protected
#define INTERNAL private public

Here comes the more interesting "stuff". You


have three options for communication between a
load reporting and monitoring servers: via UDP +
multicasting, UDP-only, or TCP. BTW, if I was
writing the article in C#, you wouldn't have them.
Really! C# is so lame in preprocessing, and the
compiler writers were so wrong that they did not
include some real preprocessing capabilities in
the compiler, that I have no words! Nevertheless,
I wrote the article in MC++, so I have the
cool #define directives I needed so badly, when I
started to write the communication code of the
classes. There are two macros you can play
with, to make the solution use one
communication protocol or another, and/or
disable/enable multicasting. Here are their
definitions:

C++ / CLI
#define USING_UDP 1
#define USING_MULTICASTS 1

Now, a C# guru:) will argue that I could still write


the protocol-independent code with several pairs
of #ifdef and #endif directives. To tell you the
truth, I'm not a fan of this coding style. I'd rather
define a generic macro in such an #if block, and
use it everywhere I need it. So that's what I did.
I've written macros that create TCP or UDP
sockets, connect to remote endpoints, and send
and receive data via UDP and TCP. Then I wrote
several generic macros that follow the pattern
below:

C++ / CLI
#if defined(USING_UDP)
# define SOCKET_CREATE(sock)
SOCKET_CREATE_UDP(sock)
#else
# define SOCKET_CREATE(sock)
SOCKET_CREATE_TCP(sock)
#endif

You get the idea, right? No #ifdefs inside the real


code. I just write SOCKET_CREATE
(socket); and the preprocessor generates the
code to create the appropriate socket. Here's
another good macro, I use for exception
handling, but before that I'll give you some rules
(you probably know) about .NET exception
handling:

 Catch only the exceptions you can handle, and


no more. This means that if you expect the
method you're calling to
throw ArgumentNullException and/or ArgumentO
utOfRangeException, you should write two catch
clauses and catch only these exceptions.
 Another rule is to never "swallow" an exception
you caught, but cannot handle. You must re-
throw it, so the caller of your method knows why
it failed.
 This one relates to the 2nd rule: there are 2
exceptions you can do nothing about but report
them to the user and die: these are
the OutOfMemoryException,
and ExecutionEngineException. I don't know
which one is worse -- probably the latter, though
if you're out of memory, there's almost nothing
you can do about it.

Because I'm not writing production code here, I


allowed myself to catch (in most of the source
code) all possible exceptions, when I don't need
to handle them except to know that something
went bad. So I catch the base class Exception.
This violates all rules, I've written above, but I
wrote some code to fit into the second and third
one -- if I catch
an OutOfMemoryException or ExecutionEngineE
xception, I re-throw it immediately. Here's the
macro I call, after I catch the
generic Exception class:

C++ / CLI
#define
TRACE_EXCEPTION_AND_RETHROW_IF_NE
EDED(e) \
System::Type __gc* exType = e->GetType ();
\
if (exType == __typeof
(OutOfMemoryException) || \
exType == __typeof
(ExecutionEngineException)) \
throw; \
Console::WriteLine ( \
S"\n{0}\n{1} ({2}/{3}): {4}\n{0}", \
new String (L'-', 79), \
new String ((char *) __FUNCTION__),
\
new String ((char *) __FILE__), \
__box (__LINE__), \
e->Message);

And finally, a word about assertions. C


has assert macro, VB had Debug.Assert method,
.NET has a static method Assert in
the Debug class too. One of the overloads of the
method takes a boolean expression, and a
string, describing the test. C's assert is smarter.
It just needs an expression, and it builds the
string, containing the expression automatically by
stringizing the expression. Now, I really hate the
fact, that C# lacks some real preprocessing
features. However, MC++ (thanks God!) was not
slaughtered by the compiler writers (long live
legacy code support), so here's my .NET version
of the C's assert macro:

C++ / CLI
#define DEBUG_ASSERT(x) Debug::Assert (x,
S#x)

If I was writing in C# the code for this article, I


should have typed

c#
Debug.Assert (null != objRef, "null != objRef");
everywhere I needed to assert. In MC++, I just
write
C++ / CLI
DEBUG_ASSERT (0 != objRef);
and it is automatically expanded into
C++ / CLI
Debug::Assert (0 != objRef, S"0 != objRef");
Not to speak about
the __LINE__, __FILE__ and __FUNCTION__ m
acros I could use in
the DEBUG_ASSERT macro! Now let's
everybody scream loudly with me: "C# sucks!":)

Tweaking the configuration file

I know you're all smart guys (otherwise what the


heck are you doing on CodeProject?:), and smart
guys don't need lengthy explanations, all they
need is to take a look at an example. So here it
is -- the XML configuration file, used by both the
Machine Load Monitoring and Reporting servers.
The explanation of all elements is given below
the file:

XML
Shrink ▲
<?xml version="1.0" encoding="utf-8"?>

<configuration>

<LoadReportingServer>

<IpAddress>127.0.0.1</IpAddress>
<Port>12000</Port>

<ReportingInterval>2000</ReportingInterval>
</LoadReportingServer>

<LoadMonitoringServer>

<IpAddress>127.0.0.1</IpAddress>
<CollectorPort>12000</CollectorPort>
<CollectorBacklog>40</CollectorBacklog>
<ReporterPort>13000</ReporterPort>
<ReporterBacklog>40</ReporterBacklog>

<MachineReportTimeout>4000</MachineReport
Timeout>

<RemotingProtocol>tcp</RemotingProtocol>

<RemotingChannelPort>14000</RemotingChan
nelPort>

<PerformanceCounters>
<counter alias="cpu"
category="Processor"
name="% Processor Time"
instance="_Total"
load-weight="0.3"
interval="500"
maximum-measures="5" />
<-- ... -->
</PerformanceCounters>

</LoadMonitoringServer>

</configuration>

Even though you're smart, I know that some of


you have some questions, I am about to answer.
First, I'm going to explain the purpose of all
elements and their attributes, and I'll cover some
wierd settings, so read on... (to save some
space, I'll refer to the
element LoadReportingServer as LRS, and I'll
write LMS instead of LoadMonitoringServer).

Element/Attribute Meaning/Usage
LRS/IpAddress When you're using UDP +
multicasting (the default),
the IpAddress is the IP
address of the multicast
group, MLMS and MLRS
join, in order to
communicate. If you're not
using multicasting, but are
still using UDP or TCP, this
element specifies the IP
address (or the host name)
of the MLMS server, MLRS
report to. Note that because
you don't use multicasting,
there's no way for the
MLRS servers to "multicast"
their machine loads
to all MLMS servers.
There's no doubt that this
element's text should be
equal to LMS/IpAddress in
any case.
LRS/Port Using UDP + multicasting,
UDP only or TCP, that's the
port to which MLRS servers
send, and at which MLMS
servers receive machine
loads.
LRS/ MLRS servers report
ReportingInterval machine loads to MLMS
ones.
The ReportingInterval specif
ies the interval (in
milliseconds) at which, a
MLRS server should report
its load to one or more
MLMS servers. If you have
paid attention in the Some
Implementation
Details section, I said, that
even if the interval has
ellapsed, a machine may
not report its load, because
it has not gathered the raw
data it needs to calculate its
load. See
the counter element's interv
al attribute for more
information.
LMS/IpAddress In the UDP + multicasting
scenario, that's the
multicast group's IP
address, as in
the LRS/IpAddress element.
When you're using UDP or
TCP only, this address is
ignored.
LMS/CollectorPort The port, on which MLMS
servers accept TCP
connections, or receive data
from, when using UDP.
LMS/ This element specifies the
CollectorBacklog maximum number of
sockets, a MLMS server will
use, when configured for
TCP communication.
LMS/ReporterPort If haven't been reading the
article carefully, you're
probably wondering what
does this element specify.
Well, in my first design, I
was not thinking that
Remoting will serve me so
well to build the Load
Balancing Library (LBL). I
wrote a mini TCP server,
which was accepting TCP
requests and returning the
least loaded machine.
Because LBL had to
connect to an MLMS server
and ask which is the fastest
machine, you can imagine
that I've written several
overloads of
the GetLeastLoadedMachin
e method, accepting
timeouts and default
machines, if there're no
available machines at all. At
the moment I finished the
LBL client, I decided that
the design was too lame, so
I rewritten the LBL library
from scratch (yeah, shit
happens:), using Remoting.
Now, I regret to tell you that
I've overwritten the original
library's source files.
However, I left the TCP
server completely working --
it lives as
the ReporterWorker class,
and persists in
the ReporterWorker.h/.cpp f
iles in
the LoadMonitoringLibrary p
roject. If you want to write
an alternative LBL library,
be my guest -- just write
some code to connect to
the LMS reporter worker
and it will report the fastest
machine's load
immediatelly. Note that the
worker is accepting TCP
sockets, so you should
always connect to it using
TCP.
LMS/ It's not difficult to figure out
ReporterBacklog that this the backlog of the
TCP server I was talking
about above.
LMS/ Now that's an interesting
MachineReportTime setting.
out The MachineReportTimeout
is the biggest interval (in
milliseconds) at which a
machine should report its
successive load in order to
stay in the load balancing.
This means, that if machine
has reported 5 seconds
ago, and the timeout
interval is set to 3 seconds,
the machine is being
removed from the load
balancing. If it later reports,
it is back in business. I think
this is a bit lame, because
one would like to configure
each machine to report in
different intervals, but I
don't have time (now) to fix
this, so you should learn to
live with this "feature". One
way to work around my
"lameness" is to give this
setting a great enough
value. Be warned though,
that if a machine is down,
you won't be able to remove
it from the load balancing
until this interval ellapses --
so don't give it too big
values.
LMS/ Originally, I thought to use
RemotingProtocol Remoting only over TCP. I
thought that HTTP would be
too slow (it is one level
above TCP in the OSI
stack). Then, after I recalled
how complex Remoting
was, I realized that the
HTTP protocol is blazingly
faster than the Remoting
itself. So I decided to give
you an option which
protocol to use. Currently,
the solution supports only
the TCP and HTTP
protocols, but you can
easily extend it to use any
protocol you wish. This
setting accepts a string,
which is either "tcp" or "http"
(without the quotes, of
course).
LMS/ That's the port, MLMS uses
RemotingChannelPo to register and activate the
rt load balancing object with
the Remoting runtime.
LMS/ This element contains a
PerformanceCounter collection of performance
s counters, used to calculate
the machine's load. Below
are the given the attributes
of the counter XML
element, used to describe
a CounterInfo object, I
written about somewhere
above.
counter/alias Though currently not used,
this attribute specifies the
alias for the otherwise too
long performance counter
path. See
the TODO(s) section for the
reason I've put this
attribute.
counter/category The general category of the
counter,
e.g. Processor, Memory,
etc.
counter/name The specific counter in the
category, e.g. % Processor
Time, Page reads/sec, etc.
counter/instance If there are two or more
instances of the counter,
the instance attribute
specifies the exact instance
of the counter. For example,
if you have two CPUs, then
the first CPU's instance is
"0", the second one is "1",
and both are named
"_Total"
counter/load-weight The weight that balance the
counter values. E.g. you
can give more weight to the
values of Processor\%
Processor Time\_Total then
to Processor\% User Time\
_Total ones. You get the
idea.
counter/interval The interval (in
milliseconds) at which a
performance counter is
asked to return its next
sample value.
counter/maximum- The size of the cyclic queue
measures (I talked about above), that
stores the transient state of
a performance counter. In
other words, the element
specifies how many counter
values should be collected
in order to get a decent
weighted average (WA).
The counter does not report
its WA until it collects at
least maximum-measures o
f sample values. If
the CounterInfo class is
asked to return its WA
before it collects the
necessary number of
sample values, it blocks and
waits until it has collected
them.

... and the other configuration file:)

Which is "the other configuration file"? Well, it is


the Web.config file in the sample load-balanced
web application. It has 3 vital keys defined in
the appSettings section. They are the machine
on which MLMS runs, and the Remoting port and
protocol where the machine has registered its
remoted object.

XML
<appSettings>
<add key="LoadBalancingMachine" value="..."
/>
<add key="LoadBalancingPort" value="..." />
<add key="LoadBalancingProtocol" value="..."
/>
</appSettings>

You can figure out what the keys mean, as you


have seen the code in the Helper class of the
web application. The last key accepts a string,
which can be either "TCP" or "HTTP" and
nothing else.

Deployment

There are 7 ways to deploy the solution onto a


single machine. That's right -- seven. To shorten
the article and lengthen my life, I'll refer to the
Machine Load Monitoring Server as LMS, to the
Machine Load Reporting Server as LRS, and the
Load Balancing Library as LBL. Here're the
variations:

1. LMS, LRS, LBL


2. LMS, LRS
3. LMS, LBL
4. LMS
5. LRS, LBL
6. LRS
7. LBL

It is you, who decide what to install and where.


But is me, who developed the setup projects, so
you have to pay some attention to what I'm about
to tell you. There are 4 setups. The first one is for
sample load-balanced web application. The
second one is for the server part of the solution,
i.e. the Machine Load Monitoring and Reporting
Servers. They're bundled in one single setup, but
it's your call which one you run, once you've
installed them. The 3rd setup contains only the
load balancing library and the 4th one contains
the entire source code of the solution, including
the source for the setups.

Here is a simple scenario to test whether the


code works: (you should have setup a multicast
group on your LAN or ask an admin to do that).
We'll use 2 machines -- A and B. On machine A,
build the SharedLibrary project first, then build
the whole solution (you may skip the setup
projects). Deploy the web application. Modify the
XML configuration file for MLMS and MRLS. Run
the servers. Deploy the web application, modify
its Web.config file and launch it. Click on the web
page's link. It should work, and the load
balancing should redirect you to the same
machine (A). Now deploy only MLRS, and the
web application to machine B. Modify the
configuration files, but this time, in Web.config,
set the LoadBalancingMachine key to "A".
You've just explained to B's LBL to use machine
A's remoted load balancing object. Run MLRS on
machine B. It should start to report B's load to
A's MLMS. Now do some CPU-intensive
operation (if < WinXP, right-click the Desktop and
drag your mouse behind the Task Bar; this
should give you about 100% CPU utilization) on
machine A. Its web application should redirect
you to the web app on machine B. Now stop the
B's MLRS server. Launch B's web application. It
should redirect you to A's one. I guess that's it.
Enjoy playing around with all possible
deployment scenarios:)

Some thoughts about MC++ and C#

Managed C++ to C# translation

There's nothing easier than converting pure


managed C++ code to C#. Just press Ctrl-H in
your mind and replace the following sequences
(this will work only for my source files, as other
developers may not use the whitespace in the
same way I use it).

MC++ C#
---- ----
:: .
-> .
__gc*
__gc
__sealed __value
using namespace using
: public :
S" "
__box (x) x

While the replacements above will translate 85%


of the code, there're several things you should do
manually:

 You have to translate all preprocessor directives,


e.g. remove the header guards (#if !defined
(...) ... #define ... #endif), and manually replace
the macros with the code, they are supposed to
generate.
 You have to convert all C++ casts to C# ones,
i.e.
C++ / CLI
static_cast<SomeType __gc*> (expression) to
c#
((SomeType) expression) or (expression as
SomeType)
 You have to put the appropriate access modifier
keyword to all members in a class, i.e. you
should change:
C++ / CLI
PUBLIC:
... Method1 (...) {...}
... Variable1;
PRIVATE:
... Method3 (...) {...}
to
c#
public ... Method1 (...) {...}
public ... Variable1;
private ... Method3 (...) {...}
 You have to combine the header and the
implementation files into a single C# source file.

C#'s readonly fields vs MC++ non-static const


members
It is really frustrating that MC++ does not have
an equivalent of C#'s readonly fields (not
properties). In C# one could write the following
class:

c#
Shrink ▲
public class PerfCounter
{
public PerfCounter (String fullPath, int
sampleInterval)
{
// validate parameters

//

Debug.Assert (null != fullPath);


if (null == fullPath)
throw (new ArgumentNullException
("fullPath"));
Debug.Assert (sampleInterval > 0);
if (sampleInterval <= 0)
throw (new
ArgumentOutOfRangeException
("sampleInterval"));

// assign the values to the readonly fields

//

FullPath = fullPath;
SampleInterval = sampleInterval;
}

// these are marked public, and make a great


replacement of

// read-only (getter) properties

//

public readonly String FullPath;


public readonly int SampleInterval;
}

You see that the C# programmer doesn't have to


implement read-only properties, because the
readonly fields are good enough. In Managed C+
+, you can simulate the readonly fields, by writing
the following class:

C++ / CLI
Shrink ▲
public __gc class PerfCounter
{
public:
PerfCounter (String __gc* fullPath, int
sampleInterval) :
FullPath (fullPath),
SampleInterval (sampleInterval)
{
// validate parameters

//

Debug::Assert (0 != fullPath);
if (0 == fullPath)
throw (new ArgumentNullException
(S"fullPath"));
Debug::Assert (sampleInterval > 0);
if (sampleInterval <= 0)
throw (new
ArgumentOutOfRangeException
(S"sampleInterval"));

// the values have been assigned in the


initialization list

// of the constructor, we have nothing more


to do -- COOL!>

//

public:
const String __gc* FullPath;
const int SampleInterval;
};

So far, so good. You're probably wondering why I


am complaining about MC++. It looks like the
MC++ version is even cooler than the C# one.
Well, the example class was too simple. Now,
imagine that when you find an invalid parameter,
you should change it to a default value, like in
the C# class below:

c#
Shrink ▲
public class PerfCounter
{
public PerfCounter (String fullPath, int
sampleInterval)
{
// validate parameters

//
Debug.Assert (null != fullPath);
if (null == fullPath)
throw (new ArgumentNullException
("fullPath"));
Debug.Assert (sampleInterval > 0);
// change to a reasonable default value

//

if (sampleInterval <= 0)
sampleInterval =
DefaultSampleInterval;

// you can STILL assign the values to the


readonly fields

//

FullPath = fullPath;
SampleInterval = sampleInterval;
}

public readonly String FullPath;


public readonly int SampleInterval;
private const int DefaultSampleInterval =
1000;
}

Now, the corresponding MC++ code will not


compile, and you'll see why below:

C++ / CLI
Shrink ▲
public __gc class CrashingPerfCounter
{
public:
CrashingPerfCounter (String __gc* fullPath, int
sampleInterval) :
FullPath (fullPath),
SampleInterval (sampleInterval)
{
// validate parameters

//

Debug::Assert (0 != fullPath);
if (0 == fullPath)
throw (new ArgumentNullException
(S"fullPath"));
Debug::Assert (sampleInterval > 0);
// the second line below will cause the
compiler to
// report "error C2166: l-value specifies
const object"

//

if (sampleInterval <= 0)
SampleInterval =
DefaultSampleInterval;

// the values have been assigned in the


initialization list

// of the constructor, and that's the only


place we can

// initialize non-static const members -- NOT


COOL!

//

public:
const String __gc* FullPath;
const int SampleInterval;

private:
static const int DefaultSampleInterval =
1000;
};

Now, one may argue, that we could initialize the


const member SampleInterval in the initialization
list of the constructor like this:

C++ / CLI
SampleInterval (sampleInterval > 0 ?
sampleInterval : DefaultSampleInteval)

and he would be right. However, if we need to


connect to a database first, in order to do the
check, or we need to perform several checks for
the parameter I can't figure out how to do this in
the initialization list. Do you? That's why MC++
sucks compared to C# for readonly fields. Now,
the programmer is forced to make the const
fields non-const and private, and write code to
implement read-only properties, like this:

C++ / CLI
Shrink ▲
public __gc class LamePerfCounter
{
public:
LamePerfCounter (String __gc* fullPath, int
sampleInterval)
{
// validate parameters

//

Debug::Assert (0 != fullPath);
if (0 == fullPath)
throw (new ArgumentNullException
(S"fullPath"));
Debug::Assert (sampleInterval > 0);
if (sampleInterval <= 0)
sampleInterval = DefaultSampleInterval;

// assign the values to the member variables

//

this->fullPath = fullPath;
this->sampleInterval = sampleInterval;
}

__property String __gc* get_FullPath ()


{
return (fullPath);
}

__property int get_SampleInterval ()


{
return (sampleInterval);
}

private:
String __gc* fullPath;
int sampleInterval;
static const int DefaultSampleInterval =
1000;
};
"Bugs suck. Period."

John Robins

"I trust that I and my colleagues will use my code


correctly. To avoid bugs, however, I verify
everything. I verify the data that others pass into
my code, I verify my code's internal
manipulations, I verify every assumption I make
in my code, I verify data my code passes to
others, and I verify data coming back from calls
my code makes. If there's something to verify, I
verify it. This obsessive verification is nothing
personal against my coworkers, and I don't have
any psychological problems (to speak of). It's just
that I know where the bugs come from; I also
know that you can't let anything by without
checking it if you want to catch your bugs as
early as you can."

John Robins

I do what John preaches in his book, and you


should do it too. Trust me, but verify my code. I
think I have debugged my code thoroughly, and I
haven't met bugs in it since the day before
yesterday (when I started to write the article).
However, if you see one of those nasty
creatures, let me know. My e-mail is
stoyan_damov[at]hotmail.com.

Though I think I don't have bugs (statistically, I


should have 8 bugs in the 4 KLOC) I'd love to
share an amazing Microsoft bug with you. It
caused me quite some time to find it, but
unfortunatelly, I was not able to reproduce it,
when it disappeared (yes! it disappeared) later.
I've wrapped all of my classes in the
namespace SoftwareLoadBalancing. So far so
good. Now I have several shared classes in
the SharedLibrary assembly. The Load
Monitoring Library uses one of these classes to
do its job, so it is #using the SharedLibrary. I was
able to build LML several times, and then
suddenly, the linker complained that it cannot
find the shared class I was using in the
namespace SoftwareLoadBalancing. I'll name
that class X to save myself some typing. I closed
the solution, went to the Debug folder of the
shared library, deleted everything, deleted all
files in the common Bin folder and tried again.
Same result! I let the linker grumble for three
more tries and then launched the ILDAsm tool.
When I looked at the SharedLibrary.dll, I found
that the class X was "wrapped" twice in the
namespace SoftwareLoadBalancing, i.e. it was
now SoftwareLoadBalancing::SoftwareLoadBala
ncing::X. Because I wanted to do some tests and
had no time to deal with the bug, I tried to alias
the namespace in LML like this:

C++ / CLI
using namespace SLB =
SoftwareLoadBalancing;

Then, I tried to access the X class, using the


following construct:

C++ / CLI
SLB::SLB::X __gc* x = new SLB::SLB::X ();

Maybe I don't understand C++ namespace


aliasing very well, or maybe the documentation
does not explain it, but what happened this time
was that the linker complained again, that it can't
find SoftwareLoadBalancing::SLB::X class!!! The
compiler "replaced" SLB with
SoftwareLoadBalancing only once. Needless to
say, I was quite embarassed. Not only the
compiler put my class wrapped in two
namespaces, but it was not helping me to work
around the problem!:) Do you know what I did
then? I aliased the namespace in a way the
linker or compiler should understand:

C++ / CLI
using namespace SLB =
SoftwareLoadBalancing::SoftwareLoadBalancing
;

Then, I tried to instantiate the X class like this:

C++ / CLI
SLB::X __gc* x = new SLB::X ();

I'm sure you don't know what happened then,


because I was hiding a simple fact from you. I
was rebuilding each time. Now are you able to
guess what happened? The linker
complained again that it cannot find class X in
the
namespace SoftwareLoadBalancing::SoftwareLo
adBalancing. WTF?! I was furious! I gone crazy! I
launched ILDasm once again, and looked at
the SharedLibrary. The class was properly
wrapped once in the
namespace SoftwareLoadBalancing. Now, I don't
know if this is a bug in the compiler or in the
linker or in my mind. What I know is that when I
have such a problem next time, I won't go to
chase unexisting bugs in my source files, but will
launch my love ILDasm and see whether I'm
doing something wrong, or Microsoft are trying to
drive me crazy:)

TODO(s)

(So what the heck have you done, when there're


so much TODOs?!)

 Re-write MLMS and MLRS executables


into .NET Windows services?
 No management console - no time for that, but
who knows, maybe in the next version of the
article, I'll build a configuration GUI, and write
some code to enable remote server
management.
 In this version, the machine load is calculated
almost staticly, i.e. counters and their weights
are configurable, but the algorithm to calculate
the machine load is the same. If I have some
time, (e.g. my lovely wife goes on vacation to her
home town for a while :), I'll implement an
expression interpreter so you guys would be able
to type arithmetic and boolean expressions
(formulas) to calculate the machine load as you
wish, i.e. one could type expressions like:
 cpu * 0.2 + ((sessions < 10) * 0.2 + (sessions >=
10) * 0.5) * sessions
 Discover a better way to store machine loads
(e.g. implementing a real priority queue), though
the current serves me well:)
 Do you remember what I've written earlier in the
article about returning the least loaded machine?
"Now, if anyone asks for the fastest machine, we
will return the first element of the ArrayList, that
is stored in the first element of the SortedList,
right?". Though I may seem to think right,
actually I'm a bit wrong. Imagine that we have 5
machines, reported the least load 10. Now
imagine that 100 queries for the least loaded
machine arrive to the MLMS. If we manage to
answer the queries before machine B reports its
load again, we will send machine B as the fastest
machine to all 100 queries. Once the clients
receive the answer, they'll all rush to overload
machine B, so next time, it may not even be able
to report its load:) What we have to do, is either
to report the fastest machine in some round-
robin way, or return a random machine from the
list of fastest machines. But that's something I'll
implement in the next version of the article.
 Change the Grim Reaper's code so that it does
not remove the machine once it hadn't reported
in time (which could be due to UDP packet loss,
not because the machine is dead). Rather have
a configurable counter, that decrements each
time a machine fails to report its load in time, and
when the counter reaches zero, the machine is
removed from the load balancing.
 SECURITY - there's no security code beside
checking the parameters in the methods, and the
validity of the TCP/UDP requests that LMS
receives. If Michael Howard or David LeBlanc
(the authors of the must-read book "Writing
Secure Code") have read this article, I'm sure
they'd rated it zero. I'm sorry! If I've implemented
security, there would be no article this month,
and I really want to win CodeProject's December
contest:)
 "Clean up" some classes. They use other
classes' internal members directly, and that's not
very cool OO programming, don't you think?
 Code coverage - I know that's not an item, which
should go on a TODO list, but though I could
swear I've checked the entire code, do not trust
me and experiment putting the code in some
really weird situations.
 Build a NDoc-generated .CHM documentation?
Really, I don't have a time. I work round the clock
to meet some "Mission Impossible" deadlines,
and I steal from my teeny-weeny sleep time to
write articles like this. Maybe some day, maybe.
 Write your feature request in the message board
below, and I'll consider implementing it for the
next version of the article.
Conclusion

Thank you for reading the article! It was the


longest article I've written in my entire life (I
started writing articles several months ago:) I'm
really impressed how patient you are! Now that I
thanked you, I should also say "Thanks!" to
Microsoft, which brought us the marvelous .NET
technology. If .NET did not exist, I doubt if I
would write such an article, and even if I did, it
wouldn't include the C++ source code. The .NET
framework makes programming so easy! It just
forces you to write all day long:) I feel like I'm not
programming, but rather prototyping. It is easier
than VB was once. Really!

Now let's see what you've learned (or just read)


from the article:

 what load balancing is in general


 my idea for dynamic software load balancing, the
architecture and some of the implementation
details
 some multithreading issues and how to solve
them
 network programming basics, including TCP,
UDP and multicasting
 some (I hope) helpful tips and workarounds
 that if COM is love, then .NET is PASSION
 that I'm a pro-Microsoft guy:)

AaaaaaaaaaaaaI forgot to tell you! Please do not


post messages in the message board below that
teach me to not use __gc*, when I could just
type *. I just love the __gc keyword, that's it:)

Below, there are two books, I've read once upon


a time, that served me well to write this article's
source code. You'll be surprised that they are
no .NET books. I'm not joking -- there're maybe
over 250 .NET books, and I've read 10 or so,
that's why I can't recommend you any .NET
book, really. It wouldn't be fair if I say "Book X is
the best on topic Y" because I haven't read at
least 1/2 of the .NET books to give you an
(authoritive) advice. The books below are not just
a "Must-Have" and "Must-Have-Read":) ones.
They are priceless for the Windows developer.
Stop reading this article, and go buy them now! :)

Programming Server-Side Applications for


Microsoft Windows 2000 (ISBN 0-7356-0753-
2) by Jeffrey Richter, Jason D. Clark

"We developers know that writing error-tolerant


code is what we should do, but frequently we
view the required attention to detail as tedious
and so omit it. We've become complacent,
thinking that the operating system will 'just take
care of us.' Many developers out there actually
believe that memory is endless, and that leaking
various resources is OK because they know that
the operatingm system will clean up everything
automatically when the process dies. Certainly
many applications are implemented in this way,
and the results are not devastating because the
applications tend to run for short periods of time
and then are restarted. However, services run
forever, and omitting the proper error-recovery
and resource-cleanup code is catastrophic!"

Debugging Applications (ISBN 0-7356-0886-


5), by John Robins

"Bugs suck. Period. Bugs are the reason you


endure death-march projects with missed
deadlines, late nights, and grouchy coworkers.
Bugs can truly make your life miserable because
if enough of them creep in to your software,
customers will stop using your product and you
could lose your job. Bugs are serious business...
As I was writing this book, NASA lost a Mars
space probe because of a bug that snuck in
during the requirements and design phase. With
computers controlling more and more mission-
critical systems, medical devices, and
superexpensive hardware, bugs can no longer
be laughed at or viewed as something that just
happens as a part of development."

And the best text on Managed Extentions for C+


+ .NET (besides the specification and the
migration guide), which is not a book, but a
Microsoft Official Curriculum (MOC) course:
"Programming with Managed Extensions for
Microsoft Visual C++ .NET" (2558). You should
definitely visit this course if you're planning to
do any .NET development using Managed C++.

A (final) word about C#

Many of you are probably wondering why I have


implemented this solution in Managed C++, and
not in C# since I'm writing only managed code. I
know that most of the .NET developers are on
the C# bandwagon. So am I. I'm programming in
C# all day long. That's my job. However, I love
C++ so much, that I prefer to write in Managed
C++. There are probably hundreds of reasons I
prefer MC++ to C#, and IJW (and unmanaged
code in general) is the probably the last on the
list. C# has nothing to do with C or C++, except
for some slight similarities in the syntax, no
matter that Microsoft are trying to convince us in
the opposite. It is way closer to Java, than to
C/C++. Do I sound extreme? Well, a C/C++ die-
hard coleague of mine (Boby --
http://606u.dir.bg/) forced himself to learn and
use VB.NET in order to not forget C++ when he
was developing a .NET application. Now who's
extreme?:) Microsoft are pushing us very badly
to forget C++, so they and some open-source C+
+ die-hards are the only ones who use it:)
Haven't you noticed? As of today, the ATL list
generates a couple of unique posts each day,
compared to at least 10-20, several months ago,
before Microsoft suddenly decided to drop it, and
put it back in a week, when catcalled by the ATL
community. And what does Microsoft say about
this, eh? COM is not dead! COM is here to stay!
ATL lives (somewhere in time). Blah, blah:) I
adore .NET, but I don't want to search for the
"The C Programming Language" and "The C++
Programming Language" books in the dusty
bookshelves in 3 or 4 years, when it will be
replaced by some even-cooler technology. Long
live C++!:) Anyway, I was tempted to implement
the solution in C#, because the monthly prizes
for it were better-looking:)

Disclaimer

This software comes "AS IS" with all faults and


with no warranties whatsoever. If you find the
source code or the article useful, wish me a
Merry Christmas:)

Wishing you the merriest Christmas, and looking


forward to seeing you next year:)
Stoyan

You might also like