MSIL is defined as the CPU independent instruction set (also
known as the Common Language Infrastructure or the CIL instruction set) that is
generated on compilation by the CLR for programs that are written in languages
that target the .NET managed environment. It is not interpreted, rather
compiled to native code before its execution. When the compiler compiles the
managed code inside the managed environment, it produces this intermediate code
that is independent of the underlying OS or the system's hardware. This
intermediate code is in turn converted to the native code by the Just in Time
(JIT) compiler. This intermediate or MSIL code is verified for type safety at
runtime to ensure security and reliability. It should be noted that this MSIL
can be both generated and compiled to the native code in any "supported
architecture" as it is an intermediate code. Further, the MSIL code is not
converted to its entirety at one go. Rather, it is converted to the native code
by the JIT compiler as and when it is needed at execution time or runtime. The
resultant native code is also cached for fast access and references for
subsequent calls.
The compiler produces metadata along with this MSIL code on
compilation of any program that is targeted at the CLR's execution environment.
The Metadata contains the assembly manifest that describes the MSIL. It typically
contains the following:
·
Definition and signature of all types inside the code
·
The types that are referenced inside the code
·
Runtime information needed for execution
The assembly
metadata helps in the following aspects:
·
Verification
·
Object Serialization
·
Garbage Collection
·
Reflection to inspect the types at runtime
Let us now understand the internals of the MSIL code that is
generated on compilation of a simple source code. We would here consider the
simplest possible code to avoid the complexities. Let us consider the following
code shown in Listing 1.
Listing 1: A simple class to display a text in C#
public class test
{
public static void Main(string[] args)
{
System.Console.WriteLine("Joydip Kanjilal");
}
}
The following is the MSIL code for the class
"test" that is generated on compilation of the source code shown in Listing
1.
Listing 2: The MSIL code that is generated on
compilation
.class public auto ansi beforefieldinit test extends [mscorlib]System.Object
{
.method public hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 1
IL_0000: ldstr "Joydip Kanjilal"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: ret
}
.method public hidebysig specialname rtspecialname instance void .ctor() cil managed
{
.maxstack 1
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ret
}
}
Explanation
The IL instructions are actually either a 1 byte or a 2 byte
operation codes or opcodes. This section discusses some of the opcodes that are
in frequent use. We will now explain the code generated as MSIL shown in
Listing 2 on compilation of the source code in Listing 1 to understand these
concepts better.
It should be noted that any class in .NET implicitly derives
from the class Object that belongs to the System namespace. The Mscorlib.dll
contains the declarations of all the base classes from which the other classes
are inherited. The .entrypoint directive indicates that the program's execution
would start from this method only. The ret directive in both of these methods
(the Main method and the default constructor) implies the end of the function
call. Note that the statement that makes a call (using the "call"
MSIL instruction) to the method WriteLine incorporates the method signature
(method arguments and return type) and also the namespace and the class to
which the method WriteLine belongs. This is helpful in validating the code for
consistency and integrity at runtime.
The MSIL instruction ldstr is responsible for loading the
string passed to it on the stack. The attribute hidebysig hides a method in one
class from its derived classes in the hierarchy. The MSIL code is not interpreted;
rather it is compiled by the JIT compiler at runtime to native code before its
execution. The auto attribute implies that the layout of the class would be
determined at runtime, while the ansi attribute is useful for interoperability
between managed and un-managed code. Needless to say, the public attribute on a
class member implies that the member can be invoked from any other part of the
program. The static attribute implies that a member belongs to the class and not
to its instance, i.e., a static member of a class is created in memory even
before the class is instantiated. Further, a static member of a class is shared
across all instances of it. Note that the .ctor statement in the MSIL code in
Listing 2 above implies a constructor. Note the statement .maxstack in the MSIL
code. This indicates the maximum number of elements that can be stored in the
evaluation stack when a method is being executed. Hence, we are done with the
explanation of the MSIL code as shown in Listing 2. Let us now understand what
Portable Executable (PE) and the Common Object File Format (COFF) file formats
(the format in which the MSIL code is stored) are. The following section
discusses PE and COFF.