[Note: this article also appeared on CodeProject]
Introduction
While understanding garbage collection fundamentals is vital to working with .NET, it is also important to understand how object allocation works. It shows you just how simple and performant it is, especially compared to the potentially blocking nature of native heap allocations. In a large, native, multi-threaded application, heap allocations can be major performance bottleneck which requires you to perform all sorts of custom heap management techniques. It’s also harder to measure when this is happening because many of those details are hidden behind the OS’s allocation APIs. More importantly, understanding this will give you clues to how you can mess up and make object allocation far less efficient.
In this article, I want to go through an example taken from Chapter 2 of Writing High-Performance .NET Code and then take it further with some additional examples that weren’t covered in the book.
Viewing Object Allocation in a Debugger
Let’s start with a simple object definition: completely empty.
class MyObject { } static void Main(string[] args) { var x = new MyObject(); }
In order to examine what happens during allocation, we need to use a “real” debugger, like Windbg. Don’t be afraid of this. If you need a quick primer on how to get started, look at the free sample chapter on this page, which will get you up and running in no time. It’s not nearly as bad you think.
Build the above program in Release mode for x86 (you can do x64 if you’d like, but the samples below are x86).
In Windbg, follow these steps to start and debug the program:
- Ctrl+E to execute a program. Navigate to and open the built executable file.
- Run command:
sxe ld clrjit
(this tells the debugger to break on loading any assembly with clrjit in the name, which you need loaded before the next steps) - Run command:
g
(continues execution) - When it breaks, run command:
.loadby sos clr
(loads .NET debugging tools) - Run command:
!bpmd ObjectAllocationFundamentals Program.Main
(Sets a breakpoint at the beginning of a method. The first argument is the name of the assembly. The second is the name of the method, including the class it is in.) - Run command:
g
Execution will break at the beginning of the Main
method, right before new()
is called. Open the Disassembly window to see the code.
Here is the Main
method’s code, annotated for clarity:
; Copy method table pointer for the class into ; ecx as argument to new() ; You can use !dumpmt to examine this value. mov ecx,006f3864h ; Call new call 006e2100 ; Copy return value (address of object) into a register mov edi,eax
Note that the actual addresses will be different each time you execute the program. Step over (F10, or toolbar) a few times until call 006e2100
(or your equivalent) is highlighted. Then Step Into that (F11). Now you will see the primary allocation mechanism in .NET. It’s extremely simple. Essentially, at the end of the current gen0 segment, there is a reserved bit of space which I will call the allocation buffer. If the allocation we’re attempting can fit in there, we can update a couple of values and return immediately without more complicated work.
If I were to outline this in pseudocode, it would look like this:
if (object fits in current allocation buffer) { Increment a pointer, return address; } else { call JIT_New to do more complicated work in CLR }
The actual assembly looks like this:
; Set eax to value 0x0c, the size of the object to ; allocate, which comes from the method table 006e2100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:006f3868=0000000c ; Put allocation buffer information into edx 006e2103 648b15300e0000 mov edx,dword ptr fs:[0E30h] ; edx+40 contains the address of the next available byte ; for allocation. Add that value to the desired size. 006e210a 034240 add eax,dword ptr [edx+40h] ; Compare the intended allocation against the ; end of the allocation buffer. 006e210d 3b4244 cmp eax,dword ptr [edx+44h] ; If we spill over the allocation buffer, ; jump to the slow path 006e2110 7709 ja 006e211b ; update the pointer to the next free ; byte (0x0c bytes past old value) 006e2112 894240 mov dword ptr [edx+40h],eax ; Subtract the object size from the pointer to ; get to the start of the new obj 006e2115 2b4104 sub eax,dword ptr [ecx+4] ; Put the method table pointer into the ; first 4 bytes of the object. ; eax now points to new object 006e2118 8908 mov dword ptr [eax],ecx ; Return to caller 006e211a c3 ret ; Slow Path - call into CLR method 006e211b e914145f71 jmp clr!JIT_New (71cd3534)
In the fast path, there are only 9 instructions, including the return. That’s incredibly efficient, especially compared to something like malloc
. Yes, that complexity is traded for time at the end of object lifetime, but so far, this is looking pretty good!
What happens in the slow path? The short answer is a lot. The following could all happen:
- A free slot somewhere in gen0 needs to be located
- A gen0 GC is triggered
- A full GC is triggered
- A new memory segment needs to be allocated from the operating system and assigned to the GC heap
- Objects with finalizers need extra bookkeeping
- Possibly more…
Another thing to notice is the size of the object: 0x0c
(12 decimal) bytes. As covered elsewhere, this is the minimum size for an object in a 32-bit process, even if there are no fields.
Now let’s do the same experiment with an object that has a single int
field.
class MyObjectWithInt { int x; }
Follow the same steps as above to get into the allocation code.
The first line of the allocator on my run is:
00882100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:00893874=0000000c
The only interesting thing is that the size of the object (0x0c) is exactly the same as before. The new int
field fit into the minimum size. You can see this by examining the object with the !DumpObject
command (or the abbreviated version: !do
). To get the address of the object after it has been allocated, step over instructions until you get to the ret
instruction. The address of the object is now in the eax register, so open up the Registers view and see the value. On my computer, it has a value of 2372770. Now execute the command: !do 2372770
You should see similar output to this:
0:000> !do 2372770 Name: ConsoleApplication1.MyObjectWithInt MethodTable: 00893870 EEClass: 008913dc Size: 12(0xc) bytes File: D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe Fields: MT Field Offset Type VT Attr Value Name 70f63b04 4000001 4 System.Int32 1 instance 0 x
This is curious. The field is at offset 4 (and an int
has a length of 4), so that only accounts for 8 bytes (range 0-7). Offset 0 (i.e., the object’s address) contains the method table pointer, so where are the other 4 bytes? This is the sync block and they are actually at offset -4 bytes, before the object’s address. These are the 12 bytes.
Try it with a long
.
class MyObjectWithLong { long x; }
The first line of the allocator is now:
00f22100 8b4104 mov eax,dword ptr [ecx+4] ds:002b:00f33874=00000010
Showing a size of 0x10 (decimal 16 bytes), which we would expect now. 12 byte minimum object size, but 4 already in the overhead, so an extra 4 bytes for the 8 byte long
. And an examination of the allocated object shows an object size of 16 bytes as well.
0:000> !do 2932770 Name: ConsoleApplication1.MyObjectWithLong MethodTable: 00f33870 EEClass: 00f313dc Size: 16(0x10) bytes File: D:\Ben\My Documents\Visual Studio 2013\Projects\ConsoleApplication1\ConsoleApplication1\bin\Release\ConsoleApplication1.exe Fields: MT Field Offset Type VT Attr Value Name 70f5b524 4000002 4 System.Int64 1 instance 0 x
If you put an object reference into the test class, you’ll see the same thing as you did with the int
.
Finalizers
Now let’s make it more interesting. What happens if the object has a finalizer? You may have heard that objects with finalizers have more overhead during GC. This is true–they will survive longer, require more CPU cycles, and generally cause things to be less efficient. But do finalizers also affect object allocation?
Recall that our Main
method above looked like this:
mov ecx,006f3864h call 006e2100 mov edi,eax
If the object has a finalizer, however, it looks like this:
mov ecx,119386Ch call clr!JIT_New (71cd3534) mov esi,eax
We’ve lost our nifty allocation helper! We have to now jump directly to JIT_New
. Allocating an object that has a finalizer is a LOT slower than a normal object. More internal CLR structures need to be modified to track this object’s lifetime. The cost isn’t just at the end of object lifetime.
How much slower is it? In my own testing, it appears to be about 8-10x worse than the fast path of allocating a normal object. If you allocate a lot of objects, this difference is considerable. For this, and other reasons, just don’t add a finalizer unless it really is required.
Calling the Constructor
If you are particularly eagle-eyed, you may have noticed that there was no call to a constructor to initialize the object once allocated. The allocator is changing some pointers, returning you an object, and there is no further function call on that object. This is because memory that belongs to a class field is always pre-initialized to 0 for you and these objects had no further initialization requirements. Let’s see what happens if we change to the following definition:
class MyObjectWithInt { int x = 13; }
Now the Main
function looks like this:
mov ecx,0A43834h ; Allocate memory call 00a32100 ; Copy object address to esi mov esi,eax ; Set object + 4 to value 0x0D (13 decimal) mov dword ptr [esi+4],0Dh
The field initialization was inlined into the caller!
Note that this code is exactly equivalent:
class MyObjectWithInt { int x; public MyObjectWithInt() { this.x = 13; } }
But what if we do this?
class MyObjectWithInt { int x; [MethodImpl(MethodImplOptions.NoInlining)] public MyObjectWithInt() { this.x = 13; } }
This explicitly disables inlining for the object constructor. There are other ways of preventing inlining, but this is the most direct.
Now we can see the call to the constructor happening after the memory allocation:
mov ecx,0F43834h call 00f32100 mov esi,eax mov ecx,esi call dword ptr ds:[0F43854h]
Exercise for the Reader
Can you get the allocator shown above to jump to the slow path? How big does the allocation request have to be to trigger this? (Hint: Try allocating arrays of various sizes.) Can you figure this out by examining the registers and other values from the running code?
Summary
You can see that in most cases, allocation of objects in .NET is extremely fast and efficient, requiring no calls into the CLR and no complicated algorithms in the simple case. Avoid finalizers unless absolutely needed. Not only are they less efficient during cleanup in a garbage collection, but they are slower to allocate as well.
Play around with the sample code in the debugger to get a feel for this yourself. If you wish to learn more about .NET memory handling, especially garbage collection, take a look at the book Writing High-Performance .NET Code.