CodeSnip: Virtual Web Services Through Pattern Matching
page 2 of 4
by Rajesh Toleti
Feedback
Average Rating: 
Views (Total / Last 10 Days): 22719/ 25

Step 1: Create a WSDL File

Listing 1:WSDL File

<?xml version="1.0" encoding="utf-8"?>

<wsdl:definitions

xmlns:s="http://www.w3.org/2001/XMLSchema"

xmlns:http="http://schemas.xmlsoap.org/wsdl/http/"

xmlns:mime="http://schemas.xmlsoap.org/wsdl/mime/"

xmlns:tm="http://microsoft.com/wsdl/mime/textMatching/"

xmlns:soap="http://schemas.xmlsoap.org/wsdl/soap/"

xmlns:soapenc="http://schemas.xmlsoap.org/soap/encoding/"

xmlns:s0="http://www.taryatechnologies.com"

targetNamespace="http://www.taryatechnologies.com"

xmlns:wsdl="http://schemas.xmlsoap.org/wsdl/">

<wsdl:types/>

<wsdl:message name="msgHttpGetIn" />

<wsdl:message name="msgHttpGetOut" />

<wsdl:portType name="ptypeHttpGet">

<wsdl:operation name="GetTaryaServices">

<wsdl:input message="s0:msgHttpGetIn"/>

<wsdl:output message="s0:msgHttpGetOut"/>

</wsdl:operation>

</wsdl:portType>

<wsdl:binding name="bindHttpGet"

type="s0:ptypeHttpGet">

<http:binding verb="GET"/>

<wsdl:operation name="GetTaryaServices">

<http:operation location="/aboutus.asp"/>

<wsdl:input>

<http:urlEncoded/>

</wsdl:input>

<wsdl:output>

<tm:text>

<tm:match

name='myServices'

pattern='&lt;ul&gt;(.*?)ul&gt;'

ignoreCase='true'

repeats='100' />

 

</tm:text>

</wsdl:output>

</wsdl:operation>

</wsdl:binding>

<wsdl:service name="TaryaService">

<wsdl:port

name="ptypeHttpGet"

binding="s0:bindHttpGet">

<http:address location="http://www.taryatechnologies.com" />

</wsdl:port>

</wsdl:service>

</wsdl:definitions>

 

I am not going to explain in detail about a WSDL file as it is outside the scope of this article. You can reference that information at http://www.w3.org/TR/wsdl

In the above file, the values in bold are the variables, which you have to change when you create your own WSDL file. I explain them below.

Listing 2:

xmlns:s0="http://www.taryatechnologies.com"

targetNamespace="http://www.taryatechnologies.com"


You have to specify the URL of the website from which you extract information.

Listing 3:

<wsdl:operation name="GetTaryaServices">


This is the method name. It can be anything you fancy. You use it later in the code (for consuming web service).

Listing 4:

<http:operation location="/aboutus.asp"/>


This is the relative path to the specific file from which you extract information.

Listing 5:

<tm:match

name='myServices'

pattern='&lt;ul&gt;(.*?)ul&gt;'

ignoreCase='true'

repeats='100' />


This is the most important part of the WSDL file. The name of the match element can be anything. The value of the pattern gives the actual content from the website. You need to be skillful while writing this expression. There are good sources on the Net to learn pattern matching. One of them is available at http://www.evolt.org/node/22700.

Before writing expression, you need to define what exactly you want from the website. You have to see the HTML source of the web page from which you want extract information. In our example I want to extract the services offered by Taryatechnologies. View the HTML code for www.taryatechnologies.com/aboutus.asp. The piece of information what we want is as follows (in HTML ).

<ul>

<li>Web Site Development</li>

<li>Web Applications</li>

<li>Web services</li>

<li>Graphical Designs</li>

<li>Mobile Applications</li>

<li>Digital Signage solutions</li>

</ul>


We want the information between tags <ul> and </ul>. So our regular expression will be: &lt;ul&gt;(.*?)ul&gt;

The same result can be obtained by different expressions.

() Used to group sequences of matches.
. Matches any character except new line.
* Matches zero or more times.
? Matches zero or one time.

 


View Entire Article

User Comments

Title: great   
Name: Saif
Date: 2006-04-10 11:01:51 AM
Comment:
Thank you
Title: Good article   
Name: Chandan
Date: 2005-12-13 4:27:01 AM
Comment:
The article is good, and it gives a very good description about the virtual web service
Title: Great article   
Name: Kay Lee
Date: 2005-11-29 10:01:14 PM
Comment:
This is a great artice, and I like that you wrote it up. I just wanted point out that this capability may or may not become legally irritating. If the site you're leeching is considered a service for sale type of site, it can become a serious problem.

For the readers, please be discrete and courteous in regards to creating Virtual Web Services.
Title: Virtual Web Services Cannot Be Blocked   
Name: Deavon
Date: 2005-11-29 8:19:07 AM
Comment:
Steve Burch:

Unfortunately, there is no way to block a virtual web service; unless you block the content from the end user completely. Anything that can be represented by XML, HTML, RSS, or any other form of non-encrypted data storage can be reparsed and reused.

It is a similar concept to copy and pasting; and then reformatting with customized style sheets; except that, in the form of a VWS, that kind of functionality occurs automatically through compiled processing and parsing of the source file; to generate a SOAP file as the result.

Same data; different way of showing it.

Utilizing .NET's WSDL and Web Service architecture; it is easy to expose the data as a WSDL function; which is exactly what is occuring here.
Title: Virtual web service -- blocking?   
Name: steve burch
Date: 2005-11-27 5:35:01 PM
Comment:
you should show, if possible, how a web site can block someone from doing this.
Title: scan and read scriptedinfo   
Name: case
Date: 2005-11-25 1:54:18 AM
Comment:
How do you read a letter that's in scriped.
Title: Good Article   
Name: Pavan K
Date: 2005-11-23 5:44:54 AM
Comment:
Its a great article, as it gives complete insight into virtual web service. Its a highly recommended reading.






Community Advice: ASP | SQL | XML | Regular Expressions | Windows


©Copyright 1998-2024 ASPAlliance.com  |  Page Processed at 2024-04-19 11:42:15 PM  AspAlliance Recent Articles RSS Feed
About ASPAlliance | Newsgroups | Advertise | Authors | Email Lists | Feedback | Link To Us | Privacy | Search