Index Server is a great product! On the administrative side of things, it is easy to install, performance is good, and once installed maintenance tasks are minimal. The development of search applications using ASP is also made fairly straightforward through the use of the Query and Utility server components.
The main limitation of Index Server is that it can really only be used to index content hosted on servers on the same machine or network as the machine hosting the Index Server service. Although it is possible to set up a share to a Unix/Linux Apache webserver using a file sharing solution such as SAMBA, this isn't always satisfactory because Index Server is not case sensitive with respect to filenames, so this can cause problems when displaying search results.
Another issue is that it can be a chore to prevent Index Server from indexing certain content on a server. Unlike a web robot, it has no concept of the Robots Exclusion Standard specification (i.e. robots.txt files) and is unaffected by the 'robots' meta tag.
Retrieving and indexing content from a web server by use of a web robot is the solution. The web robot is able to mimic a web browser, starting at one page in the site and traversing the links in the site until it has retrieved all of the pages of the site. The robot will potentially be able to retrieve content from any webserver, regardless of the platform it is hosted on. Two products that allow you to do this are Microsoft's Site Server 3.0 and the author's own Index Server Companion.
Microsoft Site Server 3.0
Microsoft's Site Server 3.0 software suite has a Search application that enhances Index Server by allowing you to (amongst other things) retrieve and index content from remote websites using an integrated web robot. For an overview of Site Server 3.0 Search, take a look at an article I wrote for ariadne.ac.uk. Unfortunately Site Server 3.0 Search has a few shortcomings, including:
- Site Server 3.0 isn't the easiest of applications to install.
- The product wasn't really designed for Windows 2000 Server.
- It doesn't appear that the product is still in active development.
- It isn't very useful if your websites are hosted by a third party, and they don't have Site Server 3.0 installed.
- Site Server 3.0 costs a lot of money, which cannot always be justified if you only want to use the Search application of the software suite.
Index Server Companion
The Index Server Companion is the cost effective method of retrieving content from remote webservers for Index Server to index. Furthermore it also allows retrieval of content from ODBC databases which can be subsequently indexed by Index Server.