.NET Screen Scraping in depth
page 1 of 8
Published: 30 Oct 2003
Unedited - Community Contributed
Abstract
Everything you need to know about screen scraping, from simply pulling down a page to more complex issues like submitting forms and cookies. Here you will learn how to use the Webclient and httpWebresponse classes and which is better for what task.
by Damian Manifold
Feedback
Average Rating: This article has not yet been rated.
Views (Total / Last 10 Days): 143062/ 255

Introduction
.NET Screen Scraping in depth
by Damian Manifold

There have been articles on ASPAlliance about data scraping, today we will be looking at the different techniques.  The WebRequest class is provided for accessing data via the web, it has two derived classes that will be looking at: Webclient and httpWebresponse.

Both classes are able to do anything you wish to do, it is more of a case of which to use for what job.

Here we will cover everything you would want to do with the two classes and see which comes out best.


View Entire Article

User Comments

Title: Awesome Article   
Name: Sandy Grupe
Date: 6/22/2012 4:19:09 AM
Comment:
Found it usefule
Title: Thanks for sharing this   
Name: Evgeniy
Date: 6/14/2011 2:28:07 PM
Comment:
Nice article. Usually i use commerce libraries for .net scraping, like Scraper or Gogybot. I think your article describe how this libraries are work internally.
Thanks!
Title: error in links   
Name: harish
Date: 4/7/2011 8:11:53 AM
Comment:
most of the links in this website are not working
Title: Frustating code   
Name: Milind
Date: 5/14/2010 3:28:08 AM
Comment:
The article seems to be good and looks like covering all the aspects. But damn, I am not able to see the code !! It just keep giving 503 error.

Hello Admin/Webmaster,Any reason? or when can we expect this to be fixed?
Thanks
Milind
Title: Great Article   
Name: Seamus McMahon
Date: 4/21/2010 12:24:01 PM
Comment:
This article is very useful. I have been reading up on screen scraping and in particular entering data into forms but there is very little of the subject covered in any books. This is has been very helpful
Title: mr   
Name: G P Zob
Date: 12/3/2009 6:50:51 AM
Comment:
what happens if the page you are scraping errors? Do you get the error page in the response stream? No. So how do you display the scraped error page in the scraping page? Any ideas?
Title: How does this translate for using with Siebel?   
Name: Marc Tucker
Date: 10/13/2009 12:23:09 PM
Comment:
I am interested in the code you use to do this in Siebel. I am using Siebel 7.8 currently, and my manager has asked if we can screen scrape data from the siebel screen to populate some notifications to our sales reps. I can use your example to scrape basic info, but how do I drill into the specific frame and or object to get the date I'm wanting to retrieve?
Title: Service Unavailable error   
Name: H Yeung
Date: 9/8/2009 5:31:48 PM
Comment:
Is the service down? I received service unavailable error when I tried to see the source.
Title: Scrap Specifically   
Name: Ross
Date: 4/4/2008 1:54:26 PM
Comment:
I want to scrap only specific content of site.Is it possible?
Title: Scraping w/o request   
Name: John McKenney
Date: 3/7/2008 8:34:05 AM
Comment:
How to do implement a scrape if you cannot request the URI. Meaning, I have to read a static HTML page served by Peoplesoft, I cannot request the URL, I alreasy have the page. I have a VB.Net app that I want to read a certain peice of data from that static page. Any pointers?
Title: Screen scraping from embedded actix controls?   
Name: Marc Tucker
Date: 2/24/2008 7:25:07 AM
Comment:
I have a siebel application that we copy and paste values out of and into another app made in vb.net. We've sent in an enhancement request to the dev team to get the data an easier way but it's not a top priority for them. Can we use screen scraping to extract the data from the specific applet in question? Siebel uses frames within frames within frames also. I have tried mapping out the frames to access the data via the DOM but that isn't getting me the info I want and need to know.
Title: Correct   
Name: cdahlkvist
Date: 12/18/2007 11:51:17 AM
Comment:
Yes, sorry, I decided to just paste his code. I actually used System.Net.HttpWebRequest.

The problem was with my .Net assembly folder. I did a 2.0 repair and it worked fine after.

My apologies.
Title: Fortunate   
Name: Brendan
Date: 12/18/2007 9:14:39 AM
Comment:
It isn't Article.HttpWebRequest. that is why you get an error. You should be using System.Net.HttpWebRequest like the author does in this article.
Title: Unfortunate   
Name: cdahlkvist
Date: 12/17/2007 2:36:42 PM
Comment:
This doesn't actually work. Consistently getting errors as follows:

Could not load type 'Article.httpWebRequest'
Title: screen scrapping with all the links having absolute path.   
Name: Ross
Date: 12/7/2007 9:57:12 AM
Comment:
I need more information on scrapped with the links having absolute path.So, that they can be mapped with the local web application.
Title: re:Webscrap a Website   
Name: DamianM
Date: 11/9/2007 6:13:56 AM
Comment:
I would be hard for me to say if your doing anything wrong this would depend on the site you are scarping. You need to mimic 100% what the browser is doing. It is possible to do what you want, it is just trick some times.
Title: Webscrap a Website   
Name: Sandeep
Date: 11/9/2007 3:33:25 AM
Comment:
I tried passing values, but did'nt worked am i doing anything wrong?. What I want the webscraper program to do is pass loginid and password to the login page and invoke the "LonIn" button click event so that I get the response and then the page after login page is called, is it possible?
Title: re:Webscrap a Website   
Name: DamianM
Date: 11/8/2007 4:33:42 AM
Comment:
Read the passing forms section. The password and user name are probably passed as a form. To simulate pressing the submit button, you will need to pass the form to whatever url the form is submitted.
Title: Webscrap a Website   
Name: Sandeep
Date: 11/8/2007 2:41:47 AM
Comment:
Can we pass login id and password to a particular website and invoke the button click event, if yes how?

I want to do web scraping for web site which asks for a userid and password (which i have) how do i pass this info to the website, also how do i invoke the button click event, so that it will execute the code behind that button and give a response.
Also once in i want to perform various task like buying a product out of many and finaly make payment using credit card, all this needs to done using web scraping.
Title: Re:web Scraping   
Name: DamianM
Date: 11/5/2007 7:00:56 AM
Comment:
>How can I run the code (on target URL e.g. Login Page) written on button click using web scraping?

There no set answer,you would have to mimic what the button click did.
Title: web Scraping   
Name: Sandeep
Date: 11/3/2007 8:07:09 AM
Comment:
How can I run the code (on target URL e.g. Login Page) written on button click using web scraping?
Title: Source Code   
Name: Code?
Date: 10/5/2007 8:25:38 AM
Comment:
The source code seems to display fine, its justs some of the example that do not work.
Title: Pages can't load   
Name: Someone interested in this topic
Date: 10/4/2007 10:07:04 PM
Comment:
Looks like a great article whereby the author intent to show the full codes directly. But the thing is.... the pages can't load...and I see error pages all the time :(
Title: Too bad...   
Name: Can't see code
Date: 4/26/2007 3:34:08 AM
Comment:
Would be a great article, but something is wrong with aspalliance setup here. Saw a couple of the examples yesterday, but today all I get is the "500" error.
Title: Where's the code?   
Name: Where's the code?
Date: 2/1/2007 3:38:30 PM
Comment:
Where's the code?
Title: Solution to Exception Details: System.Net.WebException: The underlying connection was closed: The remote name could not be resolved.   
Name: Digant Desai
Date: 2/1/2007 1:55:22 AM
Comment:
Try to add proxy server property with objReq.Proxt = new ProxyServer("ProxyServerName")
Title: .NET File Posting   
Name: Natalia
Date: 12/1/2006 1:42:38 PM
Comment:
Please advise article link on file uploading along with form elements using WebRequest or WebClient.
Title: Any solution to my above problem   
Name: Ritesh
Date: 8/2/2006 4:26:09 AM
Comment:
Is there any solution or has anyone ever encountered this
Title: Error : The underlying connection was closed: The remote name could not be resolved.   
Name: Ritesh
Date: 8/2/2006 4:25:11 AM
Comment:
Exception Details: System.Net.WebException: The underlying connection was closed: The remote name could not be resolved.

Source Error:


Line 39: HttpWebResponse objRes;
Line 40: objReq = (HttpWebRequest)WebRequest.Create("http://aspalliance.com/damianm/");//("http://news.google.com/news?ned=us&topic=h&output=atom");
Line 41: objRes = (HttpWebResponse)objReq.GetResponse();
Line 42: objSReader = new StreamReader(objRes.GetResponseStream());
Line 43: #endregion
Title: Great Article   
Name: William
Date: 7/27/2006 2:22:20 PM
Comment:
This article is complete and exactly what I wanted to read!
Thanks!
-william,
Title: screen scaraping   
Name: sara
Date: 5/5/2006 4:24:31 AM
Comment:
can i scrap pages in a for loop. if i repeat scraping inside loop it is very slow.is there any solution?
thanks,
sara
Title: screen scraping   
Name: satya
Date: 3/18/2006 8:43:05 AM
Comment:
This is usefull for me,what is the concept of screen scraping and how it works.
thank u
Title: Can't view code   
Name: Victor
Date: 1/5/2006 8:26:35 PM
Comment:
I can't view most of the code.
Title: xxx   
Name: Raj
Date: 12/29/2005 3:43:06 PM
Comment:
How can we upload / post file ? dose any one know

Raj
Title: javascript redirect   
Name: alex
Date: 7/28/2005 11:44:04 AM
Comment:
The articles are useful but they don't talk about page redirection. Does WebRequet or WebClient follow javascript page redirection.
Title: Lucid and informative   
Name: Mark
Date: 6/20/2005 2:01:10 AM
Comment:
A very clear run-through - many thanks!
Title: shite   
Name: dave
Date: 5/12/2005 4:16:30 AM
Comment:
i take it back... the article is useful but the way that the source code is presented by aspaliance is very frustrating

Product Spotlight
Product Spotlight 





Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2014 ASPAlliance.com  |  Page Processed at 4/23/2014 9:34:09 PM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search