AspAlliance.com LogoASPAlliance: Articles, reviews, and samples for .NET Developers
URL:
http://aspalliance.com/articleViewer.aspx?aId=1287&pId=-1
Compression and Decompression of Files using Visual Basic 2005
page
by Abhishek Kumar Singh
Feedback
Average Rating: 
Views (Total / Last 10 Days): 96556/ 95

Introduction

Compression is the technique to minimize the size of data (or file) by applying some encoding algorithm. The output of the compression is called compressed or zipped data (or file). Decompression is the reverse of compression to get the source data (or file) back by applying some decoding algorithm. The output of decompression is called decompressed or unzipped data (or file).

We used to share data from one computer to another (over LAN, WAN, etc), one device to another. In most cases data being transferred contains redundancy. The compression and decompression algorithms use these redundancy attributes (e.g. number of occurrences, locations of occurrence, etc.). In one way we can say - more redundancy in data (or file) would result more compressed (i.e. less size) data (or file).  In compressed data communication, sender and receiver must have mutual understanding of encoding and decoding schemes or they must have based on same standard.

The purpose of compression technique is to optimize the disk space while storing data in computer and to use less bandwidth while data is to be transferred over network. There are security benefits as well.

In general we use different data formats like text, image, audio and video during our day to day communication and data transfer. One particular encoding scheme may not be effective for every data format. For example, it is not necessary that the encoding scheme, which works well upon text data, would also give good results with image format data.

What are the types of Compression?

In general we divide compression algorithms in two types:

1. Lossless Compression

2. Lossy Compression

Lossless Compression

In lossless compression, the algorithm does not lose any part of data. It means exact source data should be generated after decompression. In business communication we use lossless compression for text based data/information. Lossless compression should be used to compress executable programs. The major purpose, as well as challenge, for lossless algorithms is to compress data as much as possible and get back the original data by taking minimum time.

Lossy Compression

In lossy compression, the algorithm does some loss of data to achieve higher compression. You do not get exact original data back in lossy compression. This compression works on those data in which some fidelity is acceptable. Picture, audio or video format data are usually compressed with lossy compression algorithm because among these some loss of data can be tolerated. Graphical image can be compressed either as lossless or lossy method depending on the requirement.

Now, there are various lossless encoding schemes (e.g. run-length encoding, Lempel-Ziv (LZ), Huffman, Deflate, GZip, TTA, FLAC, etc.) and lossy encoding schemes (e.g. MPEG-2 codec, psychoacoustics etc.).

Among these I am going to describe the DEFLATE and GZIP compression and its implementation in VB 2005.

Deflate and GZip Compression in .NET Framework 2.0

We had been depending upon third party libraries when we were requiring compressing files before release of .NET Framework 2.0. But now we have native libraries in .NET Framework 2.0. DeflateStream and GZipStream are new base class libraries in .NET Framework 2.0 under namespace System.IO.Compression in System assembly.

DeflateStream

Provides functions and properties to compress and decompress streams using the Deflate algorithm

GZipStream

Provides functions and properties to compress and decompress streams using gzip algorithm

DEFLATE compression is the patent free algorithm which is a combination of Lempel-Ziv (LZ77) and Huffman compression methods. Deflate algorithm is implemented in DeflateStream class. You can get more details about Deflate algorithm in IETF's RFC 1951.

GZip is short form of GNU Zip. It is based on the DEFLATE algorithm having similar basic functionality.

Similarities in DeflateStream and GZipStream

·         Both classes have almost similar basic logic of algorithm.

·         Implementation techniques for both classes are very similar in .Net 2005.

·         These classes are normally used to compress single file at a time. For multiple file compression we need to assemble collection of files in a tar archive before applying GZip. We can also apply concatenation of streams of multiple files to achieve that using GZip only.

Difference between DeflateStream and GZipStream

·         In GZipStream, Cyclic Redundancy Check (CRC) has been included to detect the data corruption. It makes GZipStream more reliable than DeflateStream. But due to this overhead, GZipStream always creates a larger file after compression compared to DeflateStream.

Implementation of GZipStream in VB 2005

Implementation of GZipStream compression

GZipStream class provides methods which can be used to compress and decompress data or files into a zip archive.

Listing 1 – Function to compress Byte array data and returns compressed Byte array

''' <summary>
'''  Byte to Byte compression.
''' </summary>
''' <param name="byteSource"></param>
''' <returns></returns>
''' <remarks></remarks>
 
Public Function CompressByte(ByVal byteSource() As ByteAs Byte()
  ' Create a GZipStream object and memory stream object to store compressed stream
  Dim objMemStream As New MemoryStream()
  Dim objGZipStream As New GZipStream(objMemStream, CompressionMode.Compress, True)
  objGZipStream.Write(byteSource, 0, byteSource.Length)
  objGZipStream.Dispose()
  objMemStream.Position = 0
  ' Write compressed memory stream into byte array
  Dim buffer(objMemStream.Length) As Byte
  objMemStream.Read(buffer, 0, buffer.Length)
  objMemStream.Dispose()
  Return buffer
End Function

Description

·         objGZipStream.Write (byteSource, 0, byteSource.Length) converts the source byte array into compressed stream and writes it into GZipStream object.

·          objMemStream.Position = 0 sets the current position of the compressed stream to the beginning. Note that it is necessary to position at 0 for point it the header of stream.

·         objMemStream.Read(buffer, 0, buffer.Length) writes the compressed memory stream into byte array buffer.

Listing 2 – Function to compress Byte array data and returns compressed memory stream

''' <summary>
''' Byte to stream compression.
''' </summary>
''' <param name="byteSource"></param>
''' <returns></returns>
''' <remarks></remarks>
 
Public Function CompressData(ByVal byteSource As Byte()) As MemoryStream
 
  ' Create the streams and byte arrays needed
  Dim byteBuffer As Byte() = Nothing
  Dim objSourceMemStream As MemoryStream = Nothing
  Dim objDestinationStream As New MemoryStream
 
  Dim compressedStream As GZipStream = Nothing
 
  objSourceMemStream = New MemoryStream(byteSource)
 
  byteBuffer = New Byte(objSourceMemStream.Length) {}
 
  Dim checkCounter As Integer = objSourceMemStream.Read(byteBuffer, 0, byteBuffer.Length)
 
  compressedStream = New GZipStream(objDestinationStream, CompressionMode.Compress, True)
 
  ' close all streams
  If Not (objSourceMemStream Is NothingThen
    objSourceMemStream.Close()
  End If
  If Not (compressedStream Is NothingThen
    compressedStream.Close()
  End If
  If Not (objDestinationStream Is NothingThen
    objDestinationStream.Close()
  End If
 
  ' Return compressed array of bytes
  Return objDestinationStream
End Function

Description

·         objSourceMemStream = New MemoryStream(byteSource) initializes memory stream with the source byte array.

·         objSourceMemStream.Read(byteBuffer, 0, byteBuffer.Length) reads the source stream values into the output byte array(byteBuffer() in this example).

·         compressedStream = New GZipStream(objDestinationStream, CompressionMode.Compress, True) create a compressed stream pointing to the destination stream.

Listing 3 – Function to decompress the compressed Byte array

''' <summary>
''' Decompress the byte array (compressed) by using GZipStream class library.
''' </summary>
''' <param name="byteCompressed">Compressed Byte()</param>
''' <returns>Decompressed Byte()</returns>
''' <remarks></remarks>
 
Public Function DecompressByte(ByVal byteCompressed() As ByteAs Byte()
 
  Try
  ' Initialize memory stream with byte array.
  Dim objMemStream As New MemoryStream(byteCompressed)
 
  ' Initialize GZipStream object with memory stream.
  Dim objGZipStream As New GZipStream(objMemStream, CompressionMode.Decompress)
 
  ' Define a byte array to store header part from compressed stream.
  Dim sizeBytes(3) As Byte
 
  ' Read the size of compressed stream.
  objMemStream.Position = objMemStream.Length - 5
  objMemStream.Read(sizeBytes, 0, 4)
 
  Dim iOutputSize As Integer = BitConverter.ToInt32(sizeBytes, 0)
 
  ' Posistion the to point at beginning of the memory stream to read
  ' compressed stream for decompression.
  objMemStream.Position = 0
 
  Dim decompressedBytes(iOutputSize - 1) As Byte
 
  ' Read the decompress bytes and write it into result byte array.
  objGZipStream.Read(decompressedBytes, 0, iOutputSize)
 
  objGZipStream.Dispose()
  objMemStream.Dispose()
 
  Return decompressedBytes
 
  Catch ex As Exception
  Return Nothing
  End Try
 
End Function

Description

·         objMemStream.Position = objMemStream.Length - 5  and objMemStream.Read(sizeBytes, 0, 4) reads the size of the actual stream which is coded at the last portion in compressed stream.

·         objGZipStream.Read(decompressedBytes, 0, iOutputSize) reads the decompress bytes and write it into result byte array.

Implementation of GZipStream decompression

Listing 4 – Function to decompress the compressed memory stream

''' <summary>
''' Function to decompress the memory stream(compressed) by using GZipStream class library.
''' Note that, it is important for the caller to do the compressing with GZipStream.
''' </summary>
''' <param name="compressedStream">MemoryStream (compressed)</param>
''' <param name="buffLength">Required size of result byte()</param>
''' <returns>Byte() (decompressed)</returns>
''' <remarks></remarks>
 
Public Function DecompressData(ByVal compressedStream As MemoryStream, _
ByVal buffLength As ULong) As Byte()
 
  compressedStream.Position = 0
  Dim zipStream As New GZipStream(compressedStream, CompressionMode.Decompress)
 
  Dim decompressedBuffer(buffLength + 100) As Byte
 
  ' Use the ReadAllBytesFromStream to read the stream.
  Dim totalCount As Integer = ReadAllBytesFromStream(zipStream, decompressedBuffer)
 
  Return decompressedBuffer
End Function

Description

·         compressedStream.Position = 0 resets the memory stream position to begin decompression.

·         Dim zipStream As New GZipStream(compressedStream, CompressionMode.Decompress) converts compressed stream into uncompressed stream.

·         ReadAllBytesFromStream(zipStream, decompressedBuffer) is used to get the size of the uncompressed byte array from uncompressed stream. See Listing 5 for the definition of function ReadAllBytesFromStream.

Listing 5 – Function to get the required maximum size of byte array for memory stream

''' <summary>
''' To get size of byte of the memory stream.
''' </summary>
''' <param name="stream">Input memory stream</param>
''' <param name="buffer"></param>
''' <returns></returns>
''' <remarks></remarks>
 
Public Function ReadAllBytesFromStream(ByVal stream As Stream, _
ByVal buffer As Byte()) As Integer
  ' Use this method is used to read all bytes from a stream.
  Dim offset As Integer = 0
  Dim totalCount As Integer = 0
  While True
    Dim bytesRead As Integer = stream.Read(buffer, offset, 100)
    If bytesRead = 0 Then
      Exit While
    End If
    offset + = bytesRead
    totalCount + = bytesRead
    End While
    Return totalCount
  End Function 'ReadAllBytesFromStream

Description

Function ReadAllBytesFromStream reads memory stream in loop of 100 bytes each and returns the total size. You can specify other values to the maximum number of bytes to read from stream.

Test GZipStream functionality with a Windows application

To test the above mentioned functions, create a windows application in VB 2005 and add two text box controls on the form. Name these as txtFilePath and txtDestainationFolder. Add two labels for these text boxes and change their Text property as "File to Compress/Decompress" and "Destination Folder."

Add two command buttons and name these as btnCompress and btnDecompress and change its Text property to "Compress file" and "Decompress Zipped file" respectively.

Add two more command buttons with name "btnBrowseFile" and " btnBrowseFolder." Specify Text property as "Browse" for both.

Listing 6 – Compression and Decompression windows form snap shot

Add a class files CComp.vb in the application and implement the compression function CompressByte as given in Listing1 above. Add one more class file CDcomp.vb and implement the decompression function DecompressByte as given in Listing 3.

Implement click event of btnCompress.

Listing 7 –Click event implementation of button btnCompress

Private Sub btnCompress_Click(ByVal sender As System.ObjectByVal e As System.EventArgs) _
Handles btnCompress.Click
 
  Dim byteSource() As Byte
  Dim byteCompressed() As Byte
  Dim oCCompression As New CComp
 
  ' Validate paths
  If IO.File.Exists(txtFilePath.Text) = False Then
    MsgBox("Please specify the valid file path to compress", MsgBoxStyle.OkOnly)
    txtFilePath.Focus()
    Exit Sub
  Else
    If IO.Directory.Exists(txtDestainationFolder.Text) = False Then
      MsgBox("Please specify the valid path of destination folder", MsgBoxStyle.OkOnly)
      txtDestainationFolder.Focus()
      Exit Sub
    End If
  End If
 
  Try
  Dim sFileName As String = txtFilePath.Text.Substring(txtFilePath.Text.LastIndexOf("\"+ 1)
 
  byteSource = System.IO.File.ReadAllBytes(txtFilePath.Text)
  byteCompressed = oCCompression.CompressByte(byteSource)
 
  System.IO.File.WriteAllBytes(txtDestainationFolder.Text & "\" & sFileName & ".zip", byteCompressed)
 
  MsgBox("File compressed successfully and placed in destination folder", _
MsgBoxStyle.OkOnly, "Compression")
 
  Catch ex As Exception
  MsgBox("Compression failed. Reason: " & ex.ToString())
  End Try
 
End Sub

Description

txtFilePath.Text.Substring(txtFilePath.Text.LastIndexOf("\") + 1)  returns the file name which we use to assign name to the compressed file later. As I have mentioned before, filename or file extension do not have much importance in these compression. We can assign name as we desire.

·         byteSource = System.IO.File.ReadAllBytes(txtFilePath.Text) gets the byte array of the file.

·         byteCompressed = oCCompression.CompressByte(byteSource) gets the compressed byte array by using CompressByte() function.

·         System.IO.File.WriteAllBytes(txtDestainationFolder.Text & "\" & sFileName & ".zip", byteCompressed)  writes compressed byte array into <filename>.zip file. As I discussed before we can use any extension for the compressed file. But saving in .zip extension makes it to be decompressed by using WinZip software also.

Implement click event of  btnDecompress:

Listing 8 –Click event implementation of button btnDecompress

Private Sub btnDecompress_Click(ByVal sender As System.ObjectByVal e As System.EventArgs) _
Handles btnDecompress.Click
 
  Dim oCDecompression As New CDecomp
  Dim byteDecompressed() As Byte
 
  ' Validate paths
  If IO.File.Exists(txtFilePath.Text) = False Then
    MsgBox("Please specify the valid file path to decompress", MsgBoxStyle.OkOnly)
    txtFilePath.Focus()
    Exit Sub
  Else
    If IO.Directory.Exists(txtDestainationFolder.Text) = False Then
      MsgBox("Please specify the valid path of destination folder", MsgBoxStyle.OkOnly)
      txtDestainationFolder.Focus()
      Exit Sub
    End If
  End If
 
 
  Try
  Dim sFileName As String = txtFilePath.Text.Substring(txtFilePath.Text.LastIndexOf("\") + 1)
  sFileName = sFileName.Remove(sFileName.LastIndexOf("."))
 
  byteDecompressed = oCDecompression.DecompressByte(System.IO.File.ReadAllBytes(txtFilePath.Text))
 
  System.IO.File.WriteAllBytes(txtDestainationFolder.Text & "\" & sFileName, byteDecompressed)
  MsgBox("File decompressed successfully and placed in destination folder", _
MsgBoxStyle.OkOnly, "Decompression")
 
  Catch ex As Exception
  MsgBox("Decompression failed. Reason: File used to decompress may not have gzip compression format")
  End Try
End Sub

Description

Dim sFileName As String = txtFilePath.Text.Substring(txtFilePath.Text.LastIndexOf("\") + 1) reads the filename to write into the decompressed byte.

·         sFileName = sFileName.Remove(sFileName.LastIndexOf(".")) removes the .zip extension.

·         byteDecompressed =  oCDecompression.DecompressByte(System.IO.File.ReadAllBytes( txtDestainationFolder.Text & "\" & sFileName & ".zip")) gets the decompressed byte array by using DecompressByte() function. This function takes the path of compressed file.

·         System.IO.File.WriteAllBytes(txtDestainationFolder.Text & "\" & sFileName, byteDecompressed) writes the decompressed byte array into file.

Build the application and run it. If run successfully then on the form windows specify the file to be compressed and folder location in the text box by using browse buttons given for that. Test the compression and decompression functionality of the application. You should get the output file in the destination folder which you set on the form. Verify the difference in file sizes of actual and compressed files. I will suggest to you to read the "Limitations of GZipStream and DeflateStream in Framework 2.0" section in this article before going to test application with different types and sizes of files to find ratio of compression.

In the above windows application example I have used byte array to byte array compression. In real life situations these functions are also useful when you need to apply compression with memory to memory data transfer over network where reading or writing files is not necessarily required. You might need to apply compression on xml format dataset collected from database at one end and to apply decompression at other end to get back the actual xml dataset. Here you can use functions CompressByte and DecompressByte defined above.

We can also implement byte array to memory stream compression by replacing CompressByte() with function CompressData() as defined in Listing 2 and/or replacing function DecompressByte() with DecompressData() as defined in Listing 4.

Implementation of DeflateStream

DeflateStream can be implemented similar to the implementation of GZipStream. You only need to replace GZipStream with DeflateStream class name in the above example to get Deflate compression.

Run the application and check the size of compressed files in both compression schemes. You will definitely see variation in ratio of compressed file size with respect to actual file size.

Limitations of GZipStream and DeflateStream in .NET Framework 2.0

Comparing with popular third-party products in capability, GZipStream has certain limitations.

·         One of the major limitations is that you can not assign a name to the gzip archive header. It means you have to note the file name and its extension at the time of compression and send it to the destination end where decompression is to be done. Because if file type (extension) is not known to the destination end, then it will be difficult to open the decompressed file in its proper program. This limitation is due to the fact that GZipStream class follows the GZip specification in which declaring a name is optional. Hence, GZipStream class does not include file name in its header as metadata. Whereas we can change the headers to include file names in ZIP contents by using other class like NamedGZipStream, etc.

·         Though files compressed by GZip and Deflate algorithms can be successfully decompresses by other archivers (WinZip or WinRar etc.). DeflateStream and GZipStream classes ca not be used to decompress files which are compressed by other compression archivers like WinZip or WinRar. This is because WinZip/WinRar uses advance header metadata definition while writing compressed files.

·         These algorithms should not be used to compress files which are already compressed by some other compression algorithm.

·         Maximum 4 GB of streams can be used with these archive methods.

Downloads
Conclusion

We can save GZipStream compressed files with any extension like .zip, .rar, .dat, etc. Saving into .zip or .rar extensions can give you benefits if you want to decompress it using popular WinZip or WinRar archivers, respectively. GZIP and DEFLATE compression can be applied to many formats of data and files. Now, to transfer large amounts of text based data in xml format became a choice in our day to day business. Here, applying GZIP compression would definitely improve the efficiency of transaction.

The functions implemented above would be better used while transferring large data or files to remote computers over the network. Certainly we can achieve better transfer rates using less network congestion as well as saving disk space.

Abhishek Kumar Singh
Mindfire Solutions


Product Spotlight
Product Spotlight 

©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-03-28 11:52:35 AM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search