As described earlier, care should be taken to clean up user
input before it is used in SQL queries in order to fend against SQL injection
attacks. Unfortunately, similar techniques can be used by malicious users to
perform other actions. At the very least these can be a nuisance, but there is
potential to perform actions that mislead your website's other users, make it
easier to launch cross-site scripting attacks or to launch fishing attempts.
Any user input that is gathered by a web application and then
redisplayed on a web page should be cleansed before it is displayed on that web
page. In particular, care should be taken to remove all of the HTML tags that
a user may have entered or, alternatively, all HTML mark-ups should be removed
except for a few "safe" tags, such as bold and italics tags and
paragraph formatting tags. Particular care should be taken to remove
<script> tags entered by the user. Failure to remove script tags will
allow a malicious user to run JavaScripts within pages that display their
input. This is a particular problem on websites where one user's input is
viewable by many people, such as on bulletin boards and discussion forums. At
the very least they might cause annoying JavaScript alert windows to appear
when someone views a page containing their script input. They could, however,
potentially make use of the JavaScript document.location method to cause the
user's browser to automatically go to a different URL when they visit your
website.
To remove all HTML tags from user input, a regular
expression substitution can be used, such as the one shown in the function
below.
Function stripHTMLtags(HTMLstring)
Set RegularExpressionObject = New RegExp
With RegularExpressionObject
.Pattern = "<[^>]+>"
.IgnoreCase = True
.Global = True
End With
stripHTMLtags =RegularExpressionObject.Replace(HTMLstring, "")
Set RegularExpressionObject = nothing
End Function
Alternatively, the user's input can be converted using the
Server.HTMLEncode method in classic ASP or the HTMLEncode method of the
HttpUtility class in the .NET Framework. This method would prevent any of the
user's inputted HTML from being interpreted by a web browser.
See the ASPAlliance articles http://authors.aspalliance.com/brettb/VBScriptRegularExpressions.asp
and http://aspalliance.com/555 for
further information about using Regular Expressions in ASP. Regular
Expressions are also available in ASP.NET through the
System.Text.RegularExpressions namespace.
A safer alternative to allowing users to enter HTML is to
permit them to use the Bulletin Board Code (BBCode) text formatting system that
is commonly used on many web based bulletin boards. There are a number of ASP
scripts available that will safely convert BBCode formatting to HTML using ASP.