CodeSnip: Using the IsMatch method in Regular Expressions to screen scrape a webpage
page 1 of 1
Published: 21 Apr 2006
Unedited - Community Contributed
Abstract
In this article, Steve demonstrates the usage of IsMatch method in Regular Expressions to screen scrape a webpage.
by Web Team at ORCS Web
Feedback
Average Rating: This article has not yet been rated.
Views (Total / Last 10 Days): 8135/ 23

This code-tip I discovered while developing a webservice to "screen scrape" a webpage to determine if a certain text phrase was present.  Regular expressions are best suited for achieving this task; however, they are not the easiest to learn. 

The System.Text.RegularExpressions namespace in .NET 2.0 has a handy function called IsMatch that achieves what I wanted. The code snippet below accepts two arguments (the URL to monitor, the Text to search for), makes an HTTP request and reads the webpage into a stream.  The stream is searched for the text passed into the method. The one thing I discovered while using the 'IsMatch' method is that the text is case and space sensitive.  For example, if you are searching "http://www.iislogs.com" for text in the title of the page, searching for "IIS Logs -" is the exact phrase that would be searched for.

Listing 1

Public Function URLListed(ByVal URL As String, ByValstrArgument As StringAs String
  Dim blnListed As String
  blnListed = readWebPage(URL, strArgument)
  Return blnListed
End Function

Private Function readWebPage(ByVal strSource As StringByVal strArgument AsString) As String
  Dim strLine As String
  Dim objSR As System.IO.StreamReader = Nothing
  Dim objResponse As WebResponse = Nothing
  Dim objRequest As WebRequest =System.Net.HttpWebRequest.Create(strSource)
 
  Try
  objResponse = objRequest.GetResponse
  objSR = NewSystem.IO.StreamReader(objResponse.GetResponseStream(),System.Text.Encoding.ASCII)
 
  Do While objSR.EndOfStream = False
    strLine = objSR.ReadLine()
    If Regex.IsMatch(strLine, strArgument) Then
      Return "Listed"
      Exit Function
    End If
  Loop
 
  objSR.Close()
  objResponse.Close()
  Return "Not Listed"
 
  Catch f As Exception
  Return "Error:" &f.Message.ToString()
  End Try
End Function

Conclusion

I hope this example helps in your Regular Expressions adventure. Happy coding!

 

Resources

Regular Expression Library

Regular Expression Advice



User Comments

No comments posted yet.

Product Spotlight
Product Spotlight 





Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-10-04 5:17:57 AM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search