Garbage Collection Introduction
Garbage Collection Introduction
The Garbage collection is very important technique in the .Net framework to free the unused
managed code objects in the memory and free the space to the process. I will explain about
the basics of the Garbage collection in this article.
The garbage collection (GC) is new feature in Microsoft .net framework. When we have a
class that represents an object in the runtime that allocates a memory space in the heap
memory. All the behavior of that objects can be done in the allotted memory in the heap.
Once the activities related to that object is get finished then it will be there as unused space
in the memory.
The earlier releases of Microsoft products have used a method like once the process of that
object get finished then it will be cleared from the memory. For instance Visual Basic, An
object get finishes that work then there we have to define a "nothing" to that object. So, it
clears the memory space to the processors.
Microsoft was planning to introduce a method that should automate the cleaning of unused
memory space in the heap after the life time of that object. Eventually they have introduced
a new technique "Garbage collection". It is very important part in the .Net framework. Now
it handles this object clear in the memory implicitly. It overcomes the existing explicit
unused memory space clearance.
Garbage Collection
The heap memory is divided into number of generations. Normally it is three generations.
The Generation 0 is for short live objects, Generation 1 is for medium live objects which are
moved from Generation 0. Generation 3 is mostly stable objects.
When an object is created then it will allocate the memory space which will be higher. It will
be in the Generation 0 and the memory allocation will be continuous without any space
between the generations of garbage collectors.
How it works
Implicit Garbage Collection should be handled by the .Net framework. When object is
created then it will be placed in the Generation 0. The garbage collection uses an algorithm
which checks the objects in the generation, the objects life time get over then it will be
removed from the memory. The two kinds of objects. One is Live Objects and Dead Objects.
The Garbage collection algorithm collects all unused objects that are dead objects in the
generation. If the live objects running for long time then based on that life time it will be
moved to next generation.
The object cleaning in the generation will not take place exactly after the life time over of
the particular objects. It takes own time to implement the sweeping algorithm to free the
spaces to the process.
Exception Handling
The Garbage collection has designed such a way that it can be implicitly handling to collect
the free spaces in memory. But as I said it takes own time to uses the algorithm to collect
unused objects in the memory.
If we want to forces to collect unused objects or explicitly release particular object from the
momory.The code allows us to clear the object from the heap immediately.
When it happens
The garbage collector periodically checks the heap memory to reclaim the objects when the
object has no valid references in the memory.
When an object is created then it will allocate the memory in the heap then it checks the
available space for the newly created objects, if the available space is not adequate to allot
the space then it automatically garbage collect the unused objects. If all are valid
referenced objects then it gets additional space from the processor.
If the object has reference with managed code objects then it will not free the memory
space. However it cannot control the reference with unmanaged code objects, when
application forces to collect the unused objects. But it can be achieved to write the explicit
coding to avoid managed objects reference with unmanaged objects.
The Microsoft framework System namespace have the GC class, which exposes more
method and property about garbage collection.
MaxGeneration
using System;
class GCExample1
{
public static void Main(string[] args)
{
try
{
Console.WriteLine("GC Maximum Generations:" + GC.MaxGeneration);
}
catch (Exception oEx)
{
Console.WriteLine("Error:" + oEx.Message);
}
}
}
MaxGeneration property will return the highest generation in the garbage collection. It will
be counted as total number of generations in the GC class which starts from 0.Here it has
returned 2 as maxGeneration. That means totally three generations in the Garbage
Collection. They are Generation 0, Generation 1 and Generation 2.
GetTotalMemory and GetGeneration
using System;
class BaseGC
{
public void Display()
{
Console.WriteLine("Example Method");
}
}
class GCExample2
{
public static void Main(string[] args)
{
try
{
Console.WriteLine("Total Memory:" + GC.GetTotalMemory(false));
BaseGC oBaseGC = new BaseGC();
Console.WriteLine("BaseGC Generation is :" + GC.GetGeneration(oBaseGC));
Console.WriteLine("Total Memory:" + GC.GetTotalMemory(false));
}
catch (Exception oEx)
{
Console.WriteLine("Error:" + oEx.Message);
}
}}
Here GetTotalMemory shows the total number of memory occupied by the various
resources. Here I have added one more managed code objects in the heap memory. After
adding, the size of the memory has increased.
The GetGeneration method will find out the particular managed object in the which
generation. Here it shows the Object oBaseGC in the 0th generation.
using System;
class Calci
{
public int Add(int a, int b)
{
return (a + b);
}
public int Sub(int a, int b)
{
return (a - b);
}
public int Multi(int a, int b)
{
return (a * b);
}
public int Divide(int a, int b)
{
return (a / b);
}
}
class GCExample3
{
public static void Main(string[] args)
{
Calci oCalci = new Calci();
Console.WriteLine("Calci object is now on " + GC.GetGeneration(oCalci) + "
Generation");
Console.WriteLine("Garbage Collection Occured in 0th
Generation:" +GC.CollectionCount(0));
Console.WriteLine("Garbage Collection Occured in 1th
Generation:" +GC.CollectionCount(1));
Console.WriteLine("Garbage Collection Occured in 2th
Generation:" +GC.CollectionCount(2));
GC.Collect(0);
Console.WriteLine("Garbage Collection Occured in 0th
Generation:" +GC.CollectionCount(0));
}
}
The CollectionCount helps us to find out the generation wise garbage collection occurred. As
we know there are totally three generations in the garbage collector. Here I have passed
argument as one for know the first generation. Initially it was 0. Then through the code I
have collected the unused objects in the 0th generation. Again I have checked the
CollectionCount in the 0thgeneration. Now it says 1.
The Collect method used to collect the unreferenced objects in the heap memory. It will
clear the object and reclaim the memory space.
Yeah, that's right, time to be a garbage guy. And if this line didn't make you laugh, you probably
know how bad of a stand up comedian I ever would make.
Now this is going to be long. And I'm going to jump into what I want to talk about right away. Let's
start with a regular Joe who writes C#. Let's tell him to write a "safe" looking block of code that would
essentially open a gzip file, read it as byte array, decompress the byte array using a buffer, write it
over a memory stream and return it when he is done doing the whole thing.
This is something you might expect in return. This would be the block where the aforementioned
byte array gets decompressed,
And this would be a possible segment where you see the file being read,
1. class Program
2. {
3. const int _max = 200000;
4. static void Main()
5. {
6. byte[] array = File.ReadAllBytes("Capture.7z");
7. Solution.Decompress(array);
8. }
9. }
And yes, the code sample credit goes to dotnetperls . The reason I started with an example before
any explanations is I want to build up on this. I believe when you have a place to go, sometimes the
fact that you know where you would end up reinforces what you will learn in the process. But it is
very important here to know why you would end up here.
Breaking up the code bits
Let's start breaking the code up into bits. Now, the first question you would ask the regular Joe like
me is how do you claim this code block is "safe" and what do you mean when you say this is "safe".
The first answer would essentially be that there is a beautiful using block there which would
essentially dispose the resources when it is done being used.
I focused on three words here, throughout this article I will bold out words like this and whenever I do
that, if you don't know exactly what I'm talking about, put these words in a dictionary in your brain
and in turn I will explain all these. That also means if you think you know all of these, your journey
ends here.
Words of wisdom
The first word of wisdom here is dispose. And I would essentially start with some basics for it. Before
going into dispose, we need to dive back on some proper backgrounds on the .net garbage
collection process. If you are totally new to this, garbage collector is an automatic memory manager,
it lets you develop your application without having the need to free the memory every time. If I drove
you into more confusion, that means you need to know what happens when you allocate some
memory, essentially which happens every time you declare and initiate any value or reference you
write when you are writing C#.
Every time you write the new keyword to initialize an object in C#, you essentially allocate the object
in a managed heap. And garbage collector automatically deallocates the objects that are not being
used anymore from the managed heap "some time in the future". If you are already curious what is a
managed heap, fret not. I will explain that too. But before that, let's talk about some fundamentals on
memory. When we essentially write C#, we essentially use a virtual address space. Since you have
a lot of processes in the same computer who share the same memory (read your RAM here) you
would essentially need them not to overlap with one another. Each process then needs to address a
specific set of the memory for them and thus you have your virtual address space mapped for each
process. By default 32 bit computers have around 2GB user mode virtual address space. When you
are actually allocating memory, you allocate memory on this virtual address space, not the physical
memory. For this, the garbage collector works on this virtual space and frees up this virtual memory
for you automatically. Neat, huh?
I need memory
What actually happens when we essentially write something like the following,
Looks like we are initiating a harmless cat with the name Nerd. When you compile this C Sharp
compiler will generate a common intermediate language (IL/CIL) code so the JIT compiler in CLR
can compile those for any possible machine configuration. You see, I said a lot of jargon, I didn't bold
them out because I'm not going to talk about them here. Now the intermediate code that is being
generated here kind of looks like this,
1. Calculate the total amount of memory you require for the object.
2. Look for space in the managed heap for space.
3. When the object is created, return the reference to the caller and advance the next object
pointer to the next available slot on the managed heap.
I'm quiet sure number 1 is very very easy to understand here. Why would we need to look for space
in the managed heap then? Let's look at this first.
If you look at the example here, now it should be pretty clear to you what I meant. If the next object
pointer doesn't find enough space to fit the next object in, you would expect
a OutOfMemoryException . This can also happen when you don't have enough physical memory
either. This picture also can mislead you. I will come to that now. You might think now Virtual
address space is contiguous always. Well, it's not. Virtual address space can be fragmented. This
means that there are free blocks or holes among used blocks and the virtual memory manager has
to find a big enough free block to allocate so you can instantiate your variable. So, even if you have
2GB virtual address space, this does not mean you have 2GB contiguously. If you ask the virtual
memory manager for 2GB of space, it could fail due to the fact you don't have that amount of
contiguous address space. But for regular explanations that picture will suffice well.
Now we know how objects are allocated and we spent some time on what is the managed heap and
how objects are allocated on the managed heap. The reason we discussed about this is to make you
understand why you need garbage collection and when it is triggered.
There are three states of the virtual memory. Free state says this block of memory is available for
allocation. When you request for allocation, it goes to Reserved state. Much like booking a hotel.
Now your memory block is reserved for you but not used yet. And no one else can use this block
either because you reserved it. When you finally use it, it goes to Committed state. In this state, the
block of memory has a physical storage association.
There are definitely multiple conditions which are responsible for garbage collection. And you
already know the very first one now. When you run out of space for a new allocation in the virtual
address space. We are going to jump in and see what the garbage collector actually does on a very
basic level.
Garbage collection happens in two stages: Mark and Sweep. The mark essentially searches for
managed objects that are referenced in managed code. It will attempt to finalize objects that are
unreachable. That is the first thing to do on sweep stage. The last work to do on sweep stage is to
reclaim the memory of the unreachable objects now.
I know you are thinking what is managed code. We will come back to this in the journey. Don't worry.
For now, keep in your mind that garbage collector can only deal with managed code.
So, the technique is essentially to mark objects the program might be using and just clean off the
rest one. But, how would the garbage collector know which objects it needs to clean? How would it
decide which objects are unreachable? It does it using something called Object Graph which is not
essentially under the context of this article. But I do have a nice representation to go with.
Let's assume this is the situation in the heap. You have a managed heap like this and let's assume
the garbage collector kicks in due to less memory. It would essentially look like the following after the
collection.
Now it should be evident to you what basically happens in a garbage collection from a bird's eye
view. Marked man needs finalization and then it leaves.
I still didn't properly explain how you essentially get these marked objects. To understand that
properly, we need to understand about generations.
Generations inside the heap essentially dictate how long the object would be essentially needed.
And thus it is divided into long-lived and short-lived objects. There are three generations here and
the indexing starts from zero:
1. Generation 0
This is the youngest generation and contains short-lived objects. Temporary and newly
allocated objects live here. This is the part of the heap where garbage collection happens
very frequently.
2. Generation 1
This is essentially a buffer between generation 0 and generation 2. Generation 2 contains
long-lived objects. Generation 1 essentially holds the objects who is still looking to be short-
lived but survived generation 0.
3. Generation 2
This is the generation of long-lived objects. These objects are usually objects that stay for
long time in the process. Statics come first to mind. And a new object can be allocated
straight to generation 2 instead of generation 0 if it's really big. Like a big array with a lot of
space allocation.
Garbage collections are generation specific but the collection is recursive up until the younger
generation. So it clears generation 1, it would also clear generation 0. If GC clears generation 2, it
would also go down to clear generation 1 and generation 0.
I used the word survive a moment ago. What I wanted to say is if an object doesn't get
reclaimed/cleaned up during a sweep operation over a generation it gets promoted to the next
generation. If survival rate is higher in a generation, GC tries to increase the threshold of allocation
of that specific generation. So in the next cleanup the application gets a big size of memory freed.
One more thing to remember here. Garbage collector would stop any managed thread from working.
So, it has to be quick and efficient unless you are looking at performance penalties.
If you have survived up until now, you deserve to go back to the code bit at the beginning. The first
thing I'm going to clear up is the managed vs unmanaged resources. Managed resources are
directly under the control of the garbage collector. It is a result of managed code which would
eventually compile to intermediate language. Unmanaged resources are resources your garbage
collector doesn't really know about. That includes, open files, open network connections and of
course unmanaged memory. Now, if you are using C# classes to do these, most of the times these
are almost managed. That means the managed code does the "dirty work" inside and you don't have
to clean up these yourselves. The garbage collector would clean up the managed wrapper and the
managed wrapper would clean up the unmanaged code in the disposal process.
Now, lets go back to the word dispose here. How would I dispose something off my code? Is there a
method somewhere, something I could use? Indeed there is. A dispose method implementation has
essentially two variations. Since your garbage collection can't handle unmanaged resources, you
need to wrap them. The first technique is to wrap them under any class derived
from SafeHandle class and use IDisposable interface to make it properly disposable. This very
interface would expose the dispose() method you need and you would use that to dispose the
resources yourself.