As soon as ASP.NET systems grow to the point that more than
a single web server needs to be deployed, decisions need to be taken about
where the session state will be held. By default, ASP.NET session state lives
in the memory of the server hosting the application. The problem with this is
that a user who connects to one of our web servers will not have the same
session if their next request is sent to another one of our web servers, yet
this is exactly what we want to achieve to improve the robustness of our
systems.
There are a number of hardware-based solutions that involve
concepts like "sticky sessions" where the hardware that performs the
routing of requests from users remembers which server the user last spoke to
and directs all requests from the same user to that same server. The downside
of this is that if the server in question stops working, the user's session
will also be lost.
To avoid this, it is common to move the session state off
the individual web servers. Figure 1 shows one option. It involves setting up a
single server as the holder of all session state. This can be done by
implementing an operating system service that communicates with the web servers
and which holds the session state from all the web servers in its own memory.
Figure 1: Session State Service
The downside of such a service is that it will not survive a
reboot of the server holding the state; it is limited to the memory of the
server and it is a single point of failure for the entire web site.
To make the session state storage more persistent, a SQL
Server database can be introduced. It would address the first two issues but it
is only when we then also apply fault tolerance to the database server, that we
are able to address all three issues. The most common way to do this is to
introduce a SQL Server cluster for storing the session state as shown in Figure
2.
Figure 2: Clustered Session State Database
This is precisely what is provided by the ASP.NET session
state database. While there is some performance impact of having to retrieve
the session state from another server and more particularly from a database
rather than from memory, the tradeoff is usually considered worthwhile, given
the robustness of the solution.
In general I have found this works quite well. However,
recently, I have been doing some consulting and performance tuning work at a
busy site that was having difficulties specifically with the performance of
their session state database. At this site, session state was quite large,
often between 500kb and 3Mb in size. This had been working acceptably until
they started to have over around 4000 concurrent sessions.
Periodically, commands that were accessing the session state
database were experiencing timeouts. My first step was to use a SQL Profiler
trace to find the types of commands that were timing out. This showed that the
commands most commonly taking a long time were UPDATE commands. When I tested
them individually, each UPDATE statement took less than a second. The command
timeouts were set at the default value of thirty seconds. This then means only
one of two things. Either the system is so buried in work that it just can't get
to do the work in time (and that usually means disk performance) or users were
being blocked by other users.
Conceptually, this troubled me as a session state database
is unlike most other databases when you consider concurrency. It would be very
uncommon for more than one connection to be accessing the same row in the
session state database at the same time i.e., only one web server would be
reading or writing a given session row at any point in time. SQL Server has
implemented row level locking since version 7, and so a user updating one row
should not normally be blocking another user as they would not be working with
the same row.
As this was a SQL Server 2000 server, my first stop to
locate the blocking was the Current Activity window in the Management node of
SQL Server Enterprise Manager, as shown in figure 3. (In this example, I have
created a small number of blocked processes to make the view clear.)
Figure 3: Locks per Process ID in Enterprise
Manager
In this example "spid 55" is blocking "spid
56" and "spid 57". To find out what "spid 55" is
doing, you can click on the Process Info node. This is also under the Current
Activity node as shown in Figure 4.
Figure 4: Process Info in Enterprise Manager
This then lets us see the command being executed along with
details of who is executing it. In the case of the blocking at the user site,
the culprit was a seemingly innocuous background task set up as part of the
ASP.NET session state database setup script.
Because web users might simply walk away or browse away from
your site, there is a need to clean up expired sessions in the database. However,
it is possible when reading these rows to ignore any that have expired; your
database could quickly become full of them. To avoid this, a background job has
been set up to remove expired sessions every five minutes. The job simply
executed a procedure called DeleteExpiredSessions, as shown in Listing 1.
Listing 1: Standard DeleteExpiredSessions Code
CREATE PROCEDURE DeleteExpiredSessions
AS
DECLARE @now DATETIME
SET @now = GETUTCDATE()
DELETE ASPState..ASPStateTempSessions
WHERE Expires < @now
RETURN 0
GO
While this seems very tame, if the session state rows are
large and the number of sessions is high, this command can cause substantial
blocking of other commands. This is truly ironic because there is no need to
ever lock more than a single row in this table while removing expired rows. SQL
Server on the other hand, needs to make sure that a single DELETE statement
either updates all the target rows or none at all. So, it locks them as it
goes, in case it needs to undo the entire statement. With most purposes for
which a DELETE is used, this makes sense, but for removing expired rows, this
introduces unnecessary blocking. By replacing the statement above with the code
in Listing 2, the unnecessary blocking is completely removed.
Listing 2: Replacement Expired Session Deletion
Code
CREATE PROCEDURE dbo.DeleteExpiredSessions
AS
DECLARE @now datetime
SET @now = GETUTCDATE()
CREATE TABLE #ExpiredSessions
( SessionID nvarchar(88) NOT NULL
PRIMARY KEY
)
INSERT #ExpiredSessions (SessionID)
SELECT SessionID
FROM [ASPState_2_0].dbo.ASPStateTempSessions
WHERE Expires < @now
DECLARE SessionCursor CURSOR LOCAL FORWARD_ONLY READ_ONLY
FOR SELECT SessionID FROM #ExpiredSessions ORDER BY CHECKSUM(NEWID())
DECLARE @SessionID nvarchar(88)
OPEN SessionCursor
FETCH NEXT FROM SessionCursor INTO @SessionID
WHILE @@FETCH_STATUS = 0 BEGIN
DELETE FROM [ASPState_2_0].dbo.ASPStateTempSessions
WHERE SessionID = @SessionID
FETCH NEXT FROM SessionCursor INTO @SessionID
END
CLOSE SessionCursor
DEALLOCATE SessionCursor
DROP TABLE #ExpiredSessions
RETURN 0
GO
While I am not a fan of using CURSORs in typical SQL Server
code, their use in this situation is ideal. The procedure works by obtaining a
list of rows to be deleted and then stepping through the list, deleting them
one at a time.
I then found that on the large system, this procedure could
run for over three of the five minutes in its schedule. I decided to make the
procedure work well if more than one copy of it was running at once. In the
code in Figure 6, this is done by accessing the rows in CHECKSUM(NEWID()) order
i.e., random order. This way, if more than one copy of the procedure is running
at once, it is highly likely that the second copy would be deleting different
rows than the first copy. If they both tried to delete the same row, no error
would occur. One would manage to delete it and the other would not receive any
error anyway.