The one good thing about Netscape is the bookmark.htm file, if you have ever tried to copy all your URL files to a disk you know how much longer it takes, and as for uploading a URL file, forget about it.
Here we will be looking at how to extract the bookmarks from the file.
Now getting the href from and anchor tag is quite easy. Here we are going to be extracting the path to the bookmark, and it is this which makes it a little more difficult.
The regular expression we use to do this is massive.
|
|
|
Import.aspx.vb sample 1 |
Generated using CodeView |
But do not worry; it is actually 3 straight forward expressions joined together.
This is an example of using | (or) to select between 1 of the 3 patterns we are interested in.
- an anchor tag, containing information about the bookmark
- a folder title, indicating entry into a new folder
- the literal </DL><p>, indicating the end of a folder
Once you have extracted the bookmarks, you can display them in your own format, or go that little bit further are check that none of them are dead links.
You might notice from the example that the dates are a little old. This is due the fact that the bookmark.htm was created with a utility that does not really bother with the dates when converting. The code does work fine with a Netscape produced bookmark.htm