This example illustrates how to create a screen scrape viewer that will display the selected URL in several formats, including a hex dump. I remember reading an old Petzold book (perhaps the 1st Windows book) where he said the first program he wrote for any new platform was a "file" hex dump utility to help learn the platform and provide a basic debugging tool. I've updated this here to use internet "screen-scraping" rather than just the local file system. I use this when "View Source" from the browser doesn't give me quite enough detail, particularly with the newline codes, tabs and other special characters I may need to navigate when parsing the data on the page.
The ASPX page includes a TextBox into which the user enters the specified URL. In addition to a Submit button, there are three radio buttons to choose the display format: Hex, HTML(Ascii), Web. As you will see later in the code, that later option (Web) opens a new browser window to display the specified URL by generated javascript from the codebehind. The other items on the page are the output display TextBox and an Error Label control. The Error Label is only visible if there is a problem.
<BODY>
<FORM id="HexDump" method="post" runat=
"server"> <asp:textbox id="UrlCtrl" runat="server"
HEIGHT="28px" WIDTH= "701px"></asp:textbox> <BR> <FONT
size=
"-2"><B>Enter the url like http://www.microsoft.com</B></FONT> <BR>
<INPUT
type="submit" value= "Submit">
<asp:radiobutton id="HexBtn" runat="server" AUTOPOSTBACK="True"
Checked="True" GroupName="DisplayType" Text="Hex"></asp:radiobutton>
<asp:radiobutton id="AsciiBtn" runat="server" AUTOPOSTBACK="True"
GroupName="DisplayType" Text="HTML(Ascii)"></asp:radiobutton>
<asp:radiobutton id="WebBtn" runat="server" AUTOPOSTBACK="True"
GroupName="DisplayType" Text="Web"></asp:radiobutton>
<BR>
<asp:label id="ErrorLbl" runat="server" Width="822px" Height="37px"
FORECOLOR="Red"></asp:label>
<BR>
<asp:textbox id="DisplayCtrl" runat="server" Width="95%"
Height="300px" TEXTMODE="MultiLine"></asp:textbox>
</FORM>
</BODY>
In the code below you see that everything happens within the Page_Load method during a postback. In addition to the Submit button, each of the radio buttons also cause a postback and subsequent reload of the page displaying the output in the requested format.
First the URL is checked for empty and then "http://" is added for a "www" reference. If the "Web" format radio button state is checked, I generate some javascript to launch a new browser window with the specified URL. This allows me to continue to view the page from which I am currently inspecting the hexadecimal output.
Thanks to the WebClient class, the actual screen-scrape is performed in the two lines of code within the "try" block. If this is successful, the text is displayed in either its normal HTML format, or in its hexadecimal representation.
private void Page_Load(object sender, System.EventArgs e)
{
ErrorLbl.Visible = false;
if ( this.IsPostBack )
{
string text;
string url;
url = UrlCtrl.Text.Trim();
if ( url.Length == 0 )
{
SetError("You need to enter an URL");
return;
}
if ( url.Length > 3 && url.Substring(0,3).ToLower() == "www" )
url = "http://" + url;
if ( WebBtn.Checked )
{ // simply provide IFRAME src
this.RegisterStartupScript("newwin",
"<script language='javascript'>" +
"window.open('" + url + "','WebView')</script>");
return;
}
try
{
byte[] bytes = new WebClient().DownloadData( url );
text = new UTF8Encoding().GetString( bytes );
}
catch( Exception ex )
{
SetError( ex.Message );
return;
}
if ( HexBtn.Checked )
DisplayCtrl.Text = GetHex( text );
else DisplayCtrl.Text = text;
}
}
The GetHex function takes the input string and converts it to the formatted display text with 16 hex characters on the left followed by their display character values on the right. I didn't spend a lot of time here, but I wanted to show one of the basic uses of the StringBuilder class; that is the Append method. For those of us with an MFC background, we tend to expect the StringBuilder methods to be part of the basic String class. Since strings are immutable, however (as they should be) we use the StringBuilder class instead. It is well worth studying this class to learn all that it can do.
public static string GetHex( string txt )
{
int i;
// disp is the 16-bytes of display
StringBuilder disp = new StringBuilder();
// hex is the complete output (hex+disp)
StringBuilder hex = new StringBuilder();
for( i = 0; i < txt.Length; i++ )
{
if ( i > 0 )
{
if ( i % 16 == 0 )
{
if ( hex.Length > 0 )
{ // end current line
hex.Append( " " + disp.ToString() + "\r\n" );
disp.Length = 0;
}
}
else
{
if ( i % 8 == 0 )
hex.Append("- ");
}
}
hex.Append( string.Format("{0:x2} ", (int)txt[i] ) );
if ( txt[i] >= ' ' && txt[i] <= 127 )
disp.Append( txt[i] );
else disp.Append( '.' );
}
// end of text - make sure we end the last line of hex
if ( disp.Length > 0 )
{
if ( disp.Length < 16 )
{
if ( disp.Length < 8 )
hex.Append(" ");
for( i = disp.Length; i < 16; i++ )
hex.Append(" ");
}
hex.Append( " " + disp.ToString() );
}
return hex.ToString();
} // end GetHex( txt )
private void SetError( string err )
{
ErrorLbl.Visible = true;
ErrorLbl.Text = err;
}
Conclusion
I've thrown the GetHex method into my utilities toolbox that I use for general debugging. I also use this with a WinForm application for viewing local files. I can also use this web version to look at files on the server using "file://" instead of "http://". My next effort is going to be to figure out how to use "ftp://".
Downloads
You can download HexDump.zip if you want to use these two files and don't like to type.