Understanding the Microsoft Intermediate Language
Published: 23 Jan 2007
The file format of a managed module in .NET is based on the standard Microsoft Windows Portable Executable and Common Object File Format. It conforms to the Windows PE/COFF standard and this is why the host operating system treats the managed module as an executable. This article discusses the PE file structure, the CLR header and how the managed environment of Microsoft.NET works.
by Joydip Kanjilal
Average Rating: 
Views (Total / Last 10 Days): 51945/ 112


The intermediate code that is generated when a program targeted at the JVM or CLR is compiled is the key to portability for both Java and Microsoft .NET technologies. The basic representation of the intermediate code that is generated on compilation of any program targeted at the runtime environment of .NET comprises of the MSIL instructions and its Metadata that contains the Manifest, which in turn describes the MSIL code. Any program that targets the .NET runtime environment when compiled generates MSIL code which is passed as input to the JIT compiler that translates the same to native code. The MSIL code that is generated is stored in a Portable Executable file that is essentially based on the standard Microsoft Windows Portable Executable (PE) and the Common Object File Format (COFF).

The basic objective of this article to give the reader a bird's eye view of MSIL, PE file structure, the CLR header and how the managed environment of Microsoft.NET works. Before we delve deep into the internals of MSIL, we have to have a basic understanding of some related concepts and terminologies like CLR, CLS, CTS, JIT, Class Loader, etc. These are the short forms of the respective terminologies and we will explore each of them as we progress further. Each of these is discussed in the sections that follow prior to the discussion on MSIL, PE and COFF as these are the prerequisites for having a proper understanding of MSIL.

The Common Language Runtime (CLR)

The Common Language Runtime is the runtime environment of loading and execution of programs that use the managed environment. It acts as a layer of abstraction between the .NET applications and the host Operating System. In other words, it acts as a Virtual CPU for the loading and execution of programs that are targeted at it. The CLR is responsible for managing the MSIL code and is responsible for all of the following.

·         Memory Management

·         Garbage Collection

·         Cross - Language Compatibility

·         Versioning

·         Deployment

The Common Language Specification

This is a subset of the Common Type System (CTS) and defines a set of conventions that are targeted at language interoperability of all the languages that run within the .NET environment. Further, the Common Language Specification outlines the set of rules that govern the interoperability in .NET environment.

The Common Type System

The Common Type System (CTS) is a standard that defines the necessary rules for type interoperability for the languages targeting the .NET environment.  The common type system supports the following types:

·         Value Types (Created on the stack)

·         Reference Types (Created in the managed heap)

The Just In Time Compiler (JIT)

The JIT compiler compiles the IL into native code which is then executed by the processor. This MSIL is a platform-independent Intermediate Code that all programs written in languages targeting the .NET Runtime are converted into. The platform independence stems from the fact that all programs written in any .NET Runtime compliant language are compiled to this intermediate code. This is the same for all flavors of the .NET compiler as they target the .NET Runtime and not the host Operating System. However, it should be noted that as with Java's JVM and JIT, we have to have different implementations of the CLR or JIT depending on the platform on which our application would be executed. The .NET Runtime (CLR) is responsible for compiling this intermediate code (MSIL or IL in short) to native code using the Just in Time Compiler (JIT). Note that each block of the MSIL code needs to be Jitted only once and the portion of the source code that is never called is never Jitted. The JIT handles the following:

·         Ensuring type safety and handling violations appropriately

·         Optimizations

·         Assembly Verification

Class Loader

The MSIL code runs within the context of the CLR. The .text section of the PE contains the instruction code JMP_CorExeMain that is defined in the MSCorEE.dll file. This method reads the metadata information of the MSIL code and compiles the MSIL code to native code at runtime. Finally, it executes the main method. The Class Loader is responsible for reading the metadata information, validating them for consistency and creating an internal representation of the classes and its members. It is an object that is responsible for loading classes at runtime. Internally, the CLR uses the metadata and manifest information of an assembly for loading a class using the Class Loader. It should be noted that the assembly, type and method references are all resolved at runtime. Serge Lidin says in his renowned book Inside Microsoft.NET IL Assembler, "the loader reads the metadata and creates in memory an internal representation and layout of the classes and their members. It performs this task on demand, meaning that a class is loaded and laid out only when it is referenced. Classes that are never referenced are never loaded. When loading a class, the loader runs a series of consistency checks of the related metadata."

The Managed Execution Process

The CLR is responsible for many things, such as memory management, garbage collection, etc. Let us now understand how the CLR engine takes care of the managed execution of the executables inside of its environment at runtime.

On compilation of the source in a .NET environment, an assembly (the MSIL) is generated that houses the IL instructions and metadata. Now, the Win32 loader loads the CLR and the control is handed over to it. The CLR, in turn, loads the assembly into its environment and locates the entry point or the Main. Now, it is the turn of the JIT compiler to optimize (if optimizations are needed) and translate this MSIL code to the native code to enable the processor to execute the instructions. As the execution of the code continues in the context of the CLR, objects are created, used and deleted when they are no longer needed. These objects are actually created in the managed heap. Why managed? Because the CLR takes care of when to free these objects and reclaim the memory to make room for other objects as and when it is needed. The CLR invokes the garbage collector to reclaim the memory for unused objects in memory. Learn more on how memory management and garbage collection is handled by the CLR at the following links.

Understanding Garbage Collection in .NET

When and How to Use Dispose and Finalize in C#

Microsoft Intermediate Language (MSIL)

MSIL is defined as the CPU independent instruction set (also known as the Common Language Infrastructure or the CIL instruction set) that is generated on compilation by the CLR for programs that are written in languages that target the .NET managed environment. It is not interpreted, rather compiled to native code before its execution. When the compiler compiles the managed code inside the managed environment, it produces this intermediate code that is independent of the underlying OS or the system's hardware. This intermediate code is in turn converted to the native code by the Just in Time (JIT) compiler. This intermediate or MSIL code is verified for type safety at runtime to ensure security and reliability. It should be noted that this MSIL can be both generated and compiled to the native code in any "supported architecture" as it is an intermediate code. Further, the MSIL code is not converted to its entirety at one go. Rather, it is converted to the native code by the JIT compiler as and when it is needed at execution time or runtime. The resultant native code is also cached for fast access and references for subsequent calls.

The compiler produces metadata along with this MSIL code on compilation of any program that is targeted at the CLR's execution environment. The Metadata contains the assembly manifest that describes the MSIL. It typically contains the following:

·         Definition and signature of all types inside the code

·         The types that are referenced inside the code

·         Runtime information needed for execution

The assembly metadata helps in the following aspects:

·         Verification

·         Object Serialization

·         Garbage Collection

·         Reflection to inspect the types at runtime

Let us now understand the internals of the MSIL code that is generated on compilation of a simple source code. We would here consider the simplest possible code to avoid the complexities. Let us consider the following code shown in Listing 1.

Listing 1: A simple class to display a text in C#

public class test
 public static void Main(string[] args)
   System.Console.WriteLine("Joydip Kanjilal");

The following is the MSIL code for the class "test" that is generated on compilation of the source code shown in Listing 1.

Listing 2: The MSIL code that is generated on compilation

.class public auto ansi beforefieldinit test extends [mscorlib]System.Object
  .method public hidebysig static void  Main(string[] args) cil managed
    .maxstack  1
    IL_0000:  ldstr      "Joydip Kanjilal"
    IL_0005:  call       void [mscorlib]System.Console::WriteLine(string)
    IL_000a:  ret
.method public hidebysig specialname rtspecialname instance void  .ctor() cil managed
    .maxstack  1
    IL_0000:  ldarg.0
    IL_0001:  call       instance void [mscorlib]System.Object::.ctor()
    IL_0006:  ret


The IL instructions are actually either a 1 byte or a 2 byte operation codes or opcodes. This section discusses some of the opcodes that are in frequent use. We will now explain the code generated as MSIL shown in Listing 2 on compilation of the source code in Listing 1 to understand these concepts better.

It should be noted that any class in .NET implicitly derives from the class Object that belongs to the System namespace. The Mscorlib.dll contains the declarations of all the base classes from which the other classes are inherited. The .entrypoint directive indicates that the program's execution would start from this method only. The ret directive in both of these methods (the Main method and the default constructor) implies the end of the function call. Note that the statement that makes a call (using the "call" MSIL instruction) to the method WriteLine incorporates the method signature (method arguments and return type) and also the namespace and the class to which the method WriteLine belongs. This is helpful in validating the code for consistency and integrity at runtime.

The MSIL instruction ldstr is responsible for loading the string passed to it on the stack. The attribute hidebysig hides a method in one class from its derived classes in the hierarchy. The MSIL code is not interpreted; rather it is compiled by the JIT compiler at runtime to native code before its execution. The auto attribute implies that the layout of the class would be determined at runtime, while the ansi attribute is useful for interoperability between managed and un-managed code. Needless to say, the public attribute on a class member implies that the member can be invoked from any other part of the program. The static attribute implies that a member belongs to the class and not to its instance, i.e., a static member of a class is created in memory even before the class is instantiated. Further, a static member of a class is shared across all instances of it. Note that the .ctor statement in the MSIL code in Listing 2 above implies a constructor. Note the statement .maxstack in the MSIL code. This indicates the maximum number of elements that can be stored in the evaluation stack when a method is being executed. Hence, we are done with the explanation of the MSIL code as shown in Listing 2. Let us now understand what Portable Executable (PE) and the Common Object File Format (COFF) file formats (the format in which the MSIL code is stored) are. The following section discusses PE and COFF.

Portable Executable Files and Common Object File Format

The MSIL and metadata of it are contained in a portable executable (PE) file. The term "PE" implies "Portable Executable," which came into being with the intent of having a common file format for all flavors of Windows Operating Systems on all supported CPU's. This is based on and the Common Object File Format (COFF). This file format, which accommodates MSIL or native code as well as metadata, enables the operating system to recognize common language runtime images. The presence of metadata in the file along with the MSIL makes the code self-describing, therefore, eliminating the need for type libraries or Interface Definition Language (IDL). At runtime, the metadata is read as and when it is needed for execution.

When a program targeted at the CLR is compiled, it generates MSIL code which is in turn stored in a Portable Executable (PE) format. The Portable Executable (PE) format (based on COFF) specifies a portable file format for executables, object code, and DLLs that are used in Windows operating systems. In order to make them portable across all 32-bit and 64 - bit Windows Operating Systems, all .NET assemblies are actually portable executable or PE files. The code, metadata and resources of the MSIL are actually stored in a data section in the PE file. The following is the layout of a PE file (also known as an Image file) in the COFF format.

Listing 3: The PE File Structure

The ILASM, ILDASM and the NGEN tools

We will conclude with a discussion on some related tools like ILASM, ILDASM, NGEN, etc, that ship with the Micorosft.NET Framework SDK. The Microsoft Intermediate Language Assembler or the ILASM tool is used to package the MSIL code and store it in a file in the Portable Executable (PE) format. The Microsoft Intermediate Language Disassembler or the ILDASM tool is used to retrieve the MSIL code from a PE file. According to MSDN, "the MSIL Disassembler is a companion tool to the MSIL Assembler (Ilasm.exe). Ildasm.exe takes a portable executable (PE) file that contains Microsoft intermediate language (MSIL) code and creates a text file suitable as input to Ilasm.exe." The Native Image Generator or the NGEN tool is used to generate native code from the MSIL code.



This article has discussed the architecture of MSIL, the PE file format & COFF and their related concepts and terminologies. I have tried my best to put all these concepts into one place and hope that the readers will benefit a lot from this article. Please go through the books and the links shown in the above section for a detailed understanding on these concepts.

User Comments

No comments posted yet.

Product Spotlight
Product Spotlight 

Community Advice: ASP | SQL | XML | Regular Expressions | Windows

©Copyright 1998-2021 ASPAlliance.com  |  Page Processed at 2021-02-28 10:57:20 PM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search