Published:
23 Jan 2007
|
Abstract
The file format of a managed module in .NET is based on the standard Microsoft Windows Portable Executable and Common Object File Format. It conforms to the Windows PE/COFF standard and this is why the host operating system treats the managed module as an executable. This article discusses the PE file structure, the CLR header and how the managed environment of Microsoft.NET works.
|
|
by Joydip Kanjilal
Feedback
|
Average Rating:
Views (Total / Last 10 Days):
51610/
83
|
|
|
Introduction |
The intermediate code that is generated when a program
targeted at the JVM or CLR is compiled is the key to portability for both Java
and Microsoft .NET technologies. The basic representation of the intermediate
code that is generated on compilation of any program targeted at the runtime
environment of .NET comprises of the MSIL instructions and its Metadata that contains
the Manifest, which in turn describes the MSIL code. Any program that targets
the .NET runtime environment when compiled generates MSIL code which is passed
as input to the JIT compiler that translates the same to native code. The MSIL
code that is generated is stored in a Portable Executable file that is
essentially based on the standard Microsoft Windows Portable Executable (PE)
and the Common Object File Format (COFF).
The basic objective of this article to give the reader a
bird's eye view of MSIL, PE file structure, the CLR header and how the managed
environment of Microsoft.NET works. Before we delve deep into the internals of
MSIL, we have to have a basic understanding of some related concepts and
terminologies like CLR, CLS, CTS, JIT, Class Loader, etc. These are the short
forms of the respective terminologies and we will explore each of them as we
progress further. Each of these is discussed in the sections that follow prior
to the discussion on MSIL, PE and COFF as these are the prerequisites for
having a proper understanding of MSIL.
|
The Common Language Runtime (CLR) |
The Common Language Runtime is the runtime environment of
loading and execution of programs that use the managed environment. It acts as
a layer of abstraction between the .NET applications and the host Operating
System. In other words, it acts as a Virtual CPU for the loading and execution
of programs that are targeted at it. The CLR is responsible for managing the
MSIL code and is responsible for all of the following.
·
Memory Management
·
Garbage Collection
·
Cross - Language Compatibility
·
Versioning
·
Deployment
|
The Common Language Specification |
This is a subset of the Common Type System (CTS) and defines
a set of conventions that are targeted at language interoperability of all the
languages that run within the .NET environment. Further, the Common Language
Specification outlines the set of rules that govern the interoperability in
.NET environment.
|
The Common Type System |
The Common Type System (CTS) is a standard that defines the
necessary rules for type interoperability for the languages targeting the .NET
environment. The common type system supports the following types:
·
Value Types (Created on the stack)
·
Reference Types (Created in the managed heap)
|
The Just In Time Compiler (JIT) |
The JIT compiler compiles the IL into native code which is
then executed by the processor. This MSIL is a platform-independent
Intermediate Code that all programs written in languages targeting the .NET
Runtime are converted into. The platform independence stems from the fact that
all programs written in any .NET Runtime compliant language are compiled to
this intermediate code. This is the same for all flavors of the .NET compiler
as they target the .NET Runtime and not the host Operating System. However, it
should be noted that as with Java's JVM and JIT, we have to have different implementations
of the CLR or JIT depending on the platform on which our application would be
executed. The .NET Runtime (CLR) is responsible for compiling this intermediate
code (MSIL or IL in short) to native code using the Just in Time Compiler
(JIT). Note that each block of the MSIL code needs to be Jitted only once and
the portion of the source code that is never called is never Jitted. The JIT
handles the following:
·
Ensuring type safety and handling violations appropriately
·
Optimizations
·
Assembly Verification
|
Class Loader |
The MSIL code runs within the context of the CLR. The .text
section of the PE contains the instruction code JMP_CorExeMain that is defined
in the MSCorEE.dll file. This method reads the metadata information of the MSIL
code and compiles the MSIL code to native code at runtime. Finally, it executes
the main method. The Class Loader is responsible for reading the metadata
information, validating them for consistency and creating an internal
representation of the classes and its members. It is an object that is
responsible for loading classes at runtime. Internally, the CLR uses the
metadata and manifest information of an assembly for loading a class using the
Class Loader. It should be noted that the assembly, type and method references
are all resolved at runtime. Serge Lidin says in his renowned book Inside Microsoft.NET IL Assembler, "the loader reads
the metadata and creates in memory an internal representation and layout of the
classes and their members. It performs this task on demand, meaning that a
class is loaded and laid out only when it is referenced. Classes that are never
referenced are never loaded. When loading a class, the loader runs a series of
consistency checks of the related metadata."
|
The Managed Execution Process |
The CLR is responsible for many things, such as memory
management, garbage collection, etc. Let us now understand how the CLR engine
takes care of the managed execution of the executables inside of its
environment at runtime.
On compilation of the source in a .NET environment, an
assembly (the MSIL) is generated that houses the IL instructions and metadata.
Now, the Win32 loader loads the CLR and the control is handed over to it. The
CLR, in turn, loads the assembly into its environment and locates the entry
point or the Main. Now, it is the turn of the JIT compiler to optimize (if
optimizations are needed) and translate this MSIL code to the native code to
enable the processor to execute the instructions. As the execution of the code
continues in the context of the CLR, objects are created, used and deleted when
they are no longer needed. These objects are actually created in the managed
heap. Why managed? Because the CLR takes care of when to free these objects and
reclaim the memory to make room for other objects as and when it is needed. The
CLR invokes the garbage collector to reclaim the memory for unused objects in
memory. Learn more on how memory management and garbage collection is handled
by the CLR at the following links.
Understanding
Garbage Collection in .NET
When and
How to Use Dispose and Finalize in C#
|
Microsoft Intermediate Language (MSIL) |
MSIL is defined as the CPU independent instruction set (also
known as the Common Language Infrastructure or the CIL instruction set) that is
generated on compilation by the CLR for programs that are written in languages
that target the .NET managed environment. It is not interpreted, rather
compiled to native code before its execution. When the compiler compiles the
managed code inside the managed environment, it produces this intermediate code
that is independent of the underlying OS or the system's hardware. This
intermediate code is in turn converted to the native code by the Just in Time
(JIT) compiler. This intermediate or MSIL code is verified for type safety at
runtime to ensure security and reliability. It should be noted that this MSIL
can be both generated and compiled to the native code in any "supported
architecture" as it is an intermediate code. Further, the MSIL code is not
converted to its entirety at one go. Rather, it is converted to the native code
by the JIT compiler as and when it is needed at execution time or runtime. The
resultant native code is also cached for fast access and references for
subsequent calls.
The compiler produces metadata along with this MSIL code on
compilation of any program that is targeted at the CLR's execution environment.
The Metadata contains the assembly manifest that describes the MSIL. It typically
contains the following:
·
Definition and signature of all types inside the code
·
The types that are referenced inside the code
·
Runtime information needed for execution
The assembly
metadata helps in the following aspects:
·
Verification
·
Object Serialization
·
Garbage Collection
·
Reflection to inspect the types at runtime
Let us now understand the internals of the MSIL code that is
generated on compilation of a simple source code. We would here consider the
simplest possible code to avoid the complexities. Let us consider the following
code shown in Listing 1.
Listing 1: A simple class to display a text in C#
public class test
{
public static void Main(string[] args)
{
System.Console.WriteLine("Joydip Kanjilal");
}
}
The following is the MSIL code for the class
"test" that is generated on compilation of the source code shown in Listing
1.
Listing 2: The MSIL code that is generated on
compilation
.class public auto ansi beforefieldinit test extends [mscorlib]System.Object
{
.method public hidebysig static void Main(string[] args) cil managed
{
.entrypoint
.maxstack 1
IL_0000: ldstr "Joydip Kanjilal"
IL_0005: call void [mscorlib]System.Console::WriteLine(string)
IL_000a: ret
}
.method public hidebysig specialname rtspecialname instance void .ctor() cil managed
{
.maxstack 1
IL_0000: ldarg.0
IL_0001: call instance void [mscorlib]System.Object::.ctor()
IL_0006: ret
}
}
Explanation
The IL instructions are actually either a 1 byte or a 2 byte
operation codes or opcodes. This section discusses some of the opcodes that are
in frequent use. We will now explain the code generated as MSIL shown in
Listing 2 on compilation of the source code in Listing 1 to understand these
concepts better.
It should be noted that any class in .NET implicitly derives
from the class Object that belongs to the System namespace. The Mscorlib.dll
contains the declarations of all the base classes from which the other classes
are inherited. The .entrypoint directive indicates that the program's execution
would start from this method only. The ret directive in both of these methods
(the Main method and the default constructor) implies the end of the function
call. Note that the statement that makes a call (using the "call"
MSIL instruction) to the method WriteLine incorporates the method signature
(method arguments and return type) and also the namespace and the class to
which the method WriteLine belongs. This is helpful in validating the code for
consistency and integrity at runtime.
The MSIL instruction ldstr is responsible for loading the
string passed to it on the stack. The attribute hidebysig hides a method in one
class from its derived classes in the hierarchy. The MSIL code is not interpreted;
rather it is compiled by the JIT compiler at runtime to native code before its
execution. The auto attribute implies that the layout of the class would be
determined at runtime, while the ansi attribute is useful for interoperability
between managed and un-managed code. Needless to say, the public attribute on a
class member implies that the member can be invoked from any other part of the
program. The static attribute implies that a member belongs to the class and not
to its instance, i.e., a static member of a class is created in memory even
before the class is instantiated. Further, a static member of a class is shared
across all instances of it. Note that the .ctor statement in the MSIL code in
Listing 2 above implies a constructor. Note the statement .maxstack in the MSIL
code. This indicates the maximum number of elements that can be stored in the
evaluation stack when a method is being executed. Hence, we are done with the
explanation of the MSIL code as shown in Listing 2. Let us now understand what
Portable Executable (PE) and the Common Object File Format (COFF) file formats
(the format in which the MSIL code is stored) are. The following section
discusses PE and COFF.
|
Portable Executable Files and Common Object File Format |
The MSIL and metadata of it are contained in a portable
executable (PE) file. The term "PE" implies "Portable Executable,"
which came into being with the intent of having a common file format for all
flavors of Windows Operating Systems on all supported CPU's. This is based on
and the Common Object File Format (COFF). This file format, which accommodates
MSIL or native code as well as metadata, enables the operating system to
recognize common language runtime images. The presence of metadata in the file
along with the MSIL makes the code self-describing, therefore, eliminating the
need for type libraries or Interface Definition Language (IDL). At runtime, the
metadata is read as and when it is needed for execution.
When a program targeted at the CLR is compiled, it generates
MSIL code which is in turn stored in a Portable Executable (PE) format. The
Portable Executable (PE) format (based on COFF) specifies a portable file
format for executables, object code, and DLLs that are used in Windows operating
systems. In order to make them portable across all 32-bit and 64 - bit Windows
Operating Systems, all .NET assemblies are actually portable executable or PE
files. The code, metadata and resources of the MSIL are actually stored in a
data section in the PE file. The following is the layout of a PE file (also
known as an Image file) in the COFF format.
Listing 3: The PE File Structure
|
The ILASM, ILDASM and the NGEN tools |
We will conclude with a discussion on some related tools
like ILASM, ILDASM, NGEN, etc, that ship with the Micorosft.NET Framework SDK. The
Microsoft Intermediate Language Assembler or the ILASM tool is used to package
the MSIL code and store it in a file in the Portable Executable (PE) format. The
Microsoft Intermediate Language Disassembler or the ILDASM tool is used to retrieve
the MSIL code from a PE file. According to MSDN, "the MSIL Disassembler is
a companion tool to the MSIL Assembler (Ilasm.exe). Ildasm.exe takes a portable
executable (PE) file that contains Microsoft intermediate language (MSIL) code
and creates a text file suitable as input to Ilasm.exe." The Native Image
Generator or the NGEN tool is used to generate native code from the MSIL code.
|
References |
|
Conclusion |
This article has discussed the architecture of MSIL, the PE
file format & COFF and their related concepts and terminologies. I have
tried my best to put all these concepts into one place and hope that the
readers will benefit a lot from this article. Please go through the books and
the links shown in the above section for a detailed understanding on these
concepts.
|
|
|
User Comments
No comments posted yet.
|
Product Spotlight
|
|