Index Server on Windows NT servers and Indexing Services on
Windows 2000 servers offer a good "out of the box" functionality for
building website search engines. Unfortunately, Index Server suffers from a
few issues which can cause security problems on a server.
·
Index Server itself had a number of security flaws, which were
resolved with a number of service packs from Microsoft.
·
Since Index Server catalogs files on the file system, it is
possible for content to appear in search results that you may not want.
·
Index Server is unable to differentiate between content files and
website structure files. Consequently, it is possible for website include files
and other structural files to appear in search results.
A few years ago I built an add-on for Index Server called
the Index Server Companion that uses a web crawler to
save content from a website's content and make it available for cataloging by
Index Server (read more about the Index Server Companion at http://www.winnershtriangle.com/w/Products.IndexServerCompanion.asp).
The advantage of this system is that since the website itself was crawled
rather than the files, the content of the pages appears exactly as the end user
would see it (i.e. all Include files are included and ASP interpreted) and
there is no risk of unintentionally indexing content that should not appear in
search results.
The other advantage is that the Index Server Companion obeys
web server robots.txt files conforming to the robots exclusion protocol as well
as the robots meta tag in individual website pages.
Microsoft's Site Server 3.0 has similar web crawling
functionality available by using the Gatherer component, but unfortunately Site
Server is no longer available. Some of the functionality has been transferred
to Microsoft's Share Point Portal Server, but sadly it does not do exactly the
same as Site Server used to do.