Understanding Compression and Decompression in ASP.NET 2.0: ASP Alliance


Understanding Compression and Decompression in ASP.NET 2.0

page

by SANJIT SIL
Feedback

Average Rating:

Views (Total / Last 10 Days): 34728/ 57

Introduction

System.IO.Compression namespace is a new addition in .NET 2.0 Framework. The System.IO.Compression namespace contains classes, namely DeflateStream and GZipStream, which provide basic compression and decompression services for streams. DeflateStream provides methods and properties for compressing and decompressing streams using the Deflate algorithm. GzipStream provides methods and properties used to compress and decompress streams using gzip algorithm. Besides compressing files, one other very good use of new compression features is to implement our own HTTP Module class that compress the HTTP output of our application. The sample code snippets have been written in C#.

Compressing Files

We can wrap a file stream for compressing data as it is written to disk. Following is the example showing compression of a .doc file using GzipStream class. We can take a button (say btnCompress) in form and on button click event we can call the Compress method. After executing the method we will get a compressed file and we can see that the size of the same has been reduced a lot. If we want to get back the original file from the compressed file we will call the DeCompress method. After executing this method we will get the source file. We can take another button (say btnDeCompress) in form and on its click event we can call the DeCompress method. The following code listing contains both Compress and DeCompress method.

Listing 1

private void DeCompress(string srcPath, string dstPath)
{
  using(FileStream srcStream = File.OpenRead(srcPath))
  {
    using(FileStream dstStream = File.OpenWrite(dstPath));
    {
      using(GZipStream dcStream = new GZipStream(srcStream,
        CompressMode.Decompress))int data;
      while ((data = dcStream.ReadByte()) !=  - 1)
        dstStream.WriteByte((byte)data);
    }
  }
}
 
private void Compress(string srcPath, string dstPath)
{
  using(FileStream srcStream = File.OpenRead(srcPath))
  {
    using(FileStream dstStream = File.OpenWrite(dstPath));
    {
      using(GZipStream cStream = new GZipStream(srcStream,
        CompressMode.Compress))
      {
        byte[]data = new byte[srcStream.Length];
        srcStream.Read(data, 0, data.Length);
        cStream.Write(data, 0, data.Length);
      }
    }
  }
}

Difference between DeflateStream and GZipStream

The GZipStream class adds a Cyclic Redundancy Check (CRC) to detect the data corruption. It makes GZipStream more reliable than DeflateStream. An InvalidDataException is thrown with the statement, "The CRC in GZip footer does not match the CRC calculated from the decompressed data." if data has been corrupted. Due to CRC overhead, GZipStream always creates a larger file after compression compared to DeflateStream, although the difference in size is not very significant. For more information on format, readers can see RFC 1951: DEFLATE 1.3 specification for DeflateStream and RFC 1952: GZIP 4.3 specification for GZipStream.

Limitation

DeflateStream and GZipStream classes cannot be used to decompress files which are compressed by other compression techniques because of difference in header metadata definition. We can use a maximum 4 GB of streams for compression purpose in case of both the archive methods.

Compressing HTTP output

There are couple of different supported compression formats, but gzip is the most commonly used and compatible format. Most graphical browsers, especially Internet Explorer and Netscape, support gzip decompression. Compression is used on all html output, which means it is applicable to both static and dynamic files. In case of images which are already compressed, they do not need to be touched by an http compression system. We can divide the way of implementation of compression and decompression into three different categories or we can say that there are three different ways of implementation. One is to buy a third-party tool which plugs into the Internet Information Server (IIS) software and enables http compression completely transparently outside of applications. This is good, but costs are involved here. The second option is to use an open source HTTP Compression library specifically written for ASP.NET from the inside of application. The third option is to use IIS built in http compression, but Netscape 4 does not support the same properly. The new System.IO.Compression namespace in .NET 2.0 makes it easy to implement HTTP compression without having to touch IIS. The best thing about it is that we no longer need any third party compression components; it is all built directly into .NET Framework. There are different ways to implement the compression, but I think an Http Module is the right choice for this feature. HTTP compression provides faster transmission time between compression-enabled browsers (Microsoft Internet Explorer 5.0 or later) and IIS. We can either compress static files alone, or both static files and applications. If our network bandwidth is restricted, we should consider HTTP compression, at least for static files. Here I have created a class HttpCompressionModule which implements the Init method and attach the Application.BeginRequest event and the event handler that will do the compression. We have to inherit IHttpModule interface as shown in Listing 2.

Listing 2

public class HttpCompressionModule : IHttpModule

We should write the following in web.config as specified in Listing 3:

Listing 3

<httpModules>
<add type="HttpCompressionModule" name="HttpCompressionModule"/>
</httpModules>

We have to implement the two members (Dispose and Init method) of IHttpModule interface and the same is specified in Listing 4 and 5.

Listing 4

void IHttpModule.Dispose()
}
}

Listing 5

void IHttpModule.Init(HttpApplication context)
{
  context.BeginRequest += new EventHandler(context_BeginRequest);
}

Listing 6

void context_BeginRequest(object sender, EventArgs e)
  {
    HttpApplication app = sender as HttpApplication;
 
    if (IsEncodingAccepted("gzip"))
    {
      app.Response.Filter = new GZipStream (app.Response.Filter,
          CompressionMode.Compress);
      SetEncoding("gzip");
    }
    else if (IsEncodingAccepted("deflate"))
    {
      app.Response.Filter = new DeflateStream (app.Response.Filter, 
          CompressionMode.Compress);
      SetEncoding("deflate");
    }
     
  }

The following two methods (Listing 7 and Listing 8) are for checking the request headers to see if the specified encoding is accepted by the client and adding the specified encoding to the response headers.

Listing 7

private bool IsEncodingAccepted(string encoding)
{
  return HttpContext.Current.Request.Headers["Accept-encoding"] != null &&
      HttpContext.Current.Request.Headers["Accept-encoding"].Contains(encoding);
}

Listing 8

private void SetEncoding(string encoding)
{
  HttpContext.Current.Response.AppendHeader("Content-encoding", encoding);
}

Role of IIS on Receiving Request

When IIS receives a request, it first checks whether the browser is compression-enabled or not. If the compression is enabled, IIS then checks the file name extension to see if the requested file is a static file or contains dynamic content. If the file contains static content, IIS checks to see if the file has previously been requested and is already stored in a compressed format in the temporary compression directory. If the file is not stored in a compressed format, IIS sends the uncompressed file to the browser and adds a compressed copy of the file to the temporary compression directory. If the file is stored in a compressed format, IIS sends the compressed file to the browser. Files are compressed only when they have been requested at least one time by a browser.

If the file contains dynamic content, IIS compresses the file as it is generated and sends the compressed file to the browser. Unlike static content, no copy of the file is stored.
The cost of compressing a static file is modest and is typically incurred only one time because the file is then stored in the temporary compression directory. The cost of compressing dynamically generated files is a little higher because they are not stored and must be regenerated with each request. The cost of expanding the file at the browser is minimal. The download of compressed files is faster and it is particularly beneficial to the performance of any browser that uses a network connection with restricted bandwidth (a modem for example).

References

MSDN - System.IO.Compression Namespace

HttpCompression

Conclusion

Compression is really useful when transferring large data or files to remote computers over the network. HttpCompression speeds up web page download and saves bandwidth. Thus, one rightly said, "when tied to other methods, such as proper caching configurations and the use of persistent connections, HTTP compression can greatly improve Web performance. In most cases, the total cost of ownership of implementing HTTP compression (which for users of some Web platforms is nothing!) is extremely low, and it will pay for itself in reduced bandwidth usage and improved customer satisfaction."