Background
The Cache object in ASP.Net provides a mechanism to share information across pages and users. It is similar to the Application object from ASP, but supports dependencies and expirations. A particularly useful application of this is to set an absolute expiration time at which a cache item should be “expired” or removed from the cache. Utilizing this technique a web site can increase its performance and reduce middle tier calls or database queries. Data that rarely changes is cached but also periodically updated for freshness. In order to provide a good balance between the needs of the web site content managers and performance, we decided to use a technique we called "On the Hour" caching. The idea was that we would cache particular data in the cache object until the top of the next hour. This made it easy to explain to content managers when they could expect database changes to be reflected on the site. At the top of each hour the cache would expire, and a subsequent request that required the data would issue a query to retrieve it, and repopulate the cache for that hour. After implementing this technique, we avoided 75 million queries during a single month. Unfortunately (or fortuantely), due to load on the web server there were often many individual page requests occurring simultaneously which each attempted to repopulate the cache. Since the query had to do a great deal of work to populate the cache (which is why caching was so successful) the load on the database server with cache re-population requests at the top of the hour became an issue.
This article first demonstrates the "On the Hour" caching technique, and then shows how to ensure that each web server only makes a single request to repopulate the data.
On the Hour Caching
A class is used to manage the cached data to insulate the pages on the site from the issues of how the cached data is managed. The class provides a method to get the data required by the pages called GetData( ) and an internal private method that determines the next hour GetNextHour( ). Assuming that the data was previously used 1,000 times in a given hour, this technique has reduced the number of database calls by 999. This can produce especially dramatic results when the query is very resource intensive or long running since only a single query is required to repopulate the cache. This not only reduces the time required for the page to complete, but also greatly reduces load on the database server. Web site owners and users are extremely pleased with the performance benefits, and content managers know that they can change the underlying data and it will be reflected on the web site within an hour.
public class Class1
{
public Class1()
{
}
//This method handles always returning the data (in this case a
// simple string, but could be DataSet, etc). Pages can simply call
// this method and it always returns the desired data (either from
// the cache or calls the database to get the data and then
// populates the cache)
public static string GetData()
{
//Get a reference to the Cache object, not automatically present because
// a standalone class does not inherit from the Page class which contains
// a Cache member
Cache oCache = System.Web.HttpContext.Current.Cache;
//See if the data is already cached (use example key "A"), get
// a reference in case cache expires while processing
string data = (string)oCache["A"];
if (null == data)
{
//If not already present, issue a query to get the data
// Simulate a query that takes 2 seconds to complete
System.Threading.Thread.Sleep(2000);
//Take the data from call (in this case just simulate via
// "data" string below), and add it to the cache. Setup to
// expire at the top of the next hour. Do not use a sliding
// expiration which would bump up the timeout to an hour each
// time the cache data is used. We want an absolute timeout
// at the top of the next hour so content developers know that
// changes will be reflected at that time.
data = "Data";
oCache.Insert( "A", data, null, GetNextHour(1),
System.Web.Caching.Cache.NoSlidingExpiration);
}
//Whether the data was already cached or just requested and
// cached, it is now present in the cache so return the data
return data;
}
private static DateTime GetNextHour(int iHours)
{
//Get the current time, add on the desired number of hours
DateTime oTime = DateTime.Now.AddHours((double)iHours);
//Can't seem to adjust the values in existing structure, so
// create new one and 0 out the minutes, seconds, and milliseconds
DateTime oNextHour = new DateTime(oTime.Year, oTime.Month,
oTime.Day, oTime.Hour, 0, 0, 0);
return oNextHour;
}
}
Race Conditions
The on the hour caching mechanism can increase performance dramatically since now all requests for the desired data is stored in the web server cache rather than requiring another query for every page that needs the data. If a user requests A.aspx that used the above method at 1:10pm with an originally empty cache, the method would get the data and cache it set to expire at 2pm. All requests made during the time period from 1:10pm until 2pm that require the data will use the cache locally on the web server. At 2pm ASP.Net the cache data would expire (which makes the value of the cache lookup null). If 10 users all made simultaneous requests for .aspx pages at 2:01pm, they would each call GetData() to refresh the cache on various worker threads. If the query is particularly resource intensive, having 10 simultaneous requests may cause resource issues for the database. In order to avoid that situation, it is necessary to implement a locking/critical section solution to ensure that only one of the 10 requests will actually issue the query. Thankfully, C# provides a lock keyword that can be used to ensure that only one single thread can call any static method on a class at the same time. The following is the required restructure of the GetData() method:
public static string GetData()
{
Cache oCache = System.Web.HttpContext.Current.Cache;
if (null == oCache["A"])
{
//Use the lock statement to ensure that only one request
// can be making a call to RaceForCache() at a time.
// If a request comes and the lock is free, it can
// immediately enter this section and make the call.
// If the lock is not free, this request will wait until
// the lock holder completes the RaceForCache() call.
// WARNING: Once the original lock holder completes,
// one of the other requests will be allowed in and
// will also attempt to call RaceForCache()!
lock(typeof(lockHolder))
{
lockHolder.RaceForCache();
}
}
return (string)oCache["A"];
}
private class lockHolder
{
public static void RaceForCache()
{
//Since each request that saw the cache originally null
// will eventually get here, need to ensure that it
// has not already been retrieved by another request
if (null == System.Web.HttpRuntime.Cache["A"])
{
System.Threading.Thread.Sleep(2000);
System.Web.HttpRuntime.Cache["A"] = "Data";
}
}
}
With the locking changes implemented, the 10 simultaneous requests that required the cached data would require only a single query rather than 10. The first request would reach the lock statement and find that it was free, and continue immediately into the RaceForCache() method. It would then issue the query (simulated by 2 second sleep) and refresh the cache. The other 9 requests that arrived at the same time as the first would each attempt to acquire the lock, and each would see that it was already in use and would block while waiting for the lock to be released. When the first thread completes, one of the other 9 will be allowed to acquire the lock and make the call to RaceForCache(). This thread will now check the cache and see that it has already been filled so it will simply return back to GetData(). It will then use the data that was already retrieved. The cache check is critical in RaceForCache() because otherwise each subsequent thread would enter the method and issue the query again, which is exactly what we are trying to avoid. Each of the remaining requests for the data will follow a similar pattern. We have successfully allowed each of the 10 requests to use the latest data while only issuing a single query. The lock statement is also safe because if an exception were to occur in the RaceForCache(), the lock is automatically released so another thread can acquire the lock.
WARNING: The locking technique demonstrated should only be used when the situation is similar to the one described (load issues on cache repopulation). Locking in general can lead to deadlocks if not handled properly. It is critical when using the above technique that the lockHolder class be declared private! Most sites can simply use the on the hour caching technique and don't need to worry about race conditions.
Conclusion
The Cache object provides a great opportunity to increase performance and scalability for ASP.Net sites. It is particular useful for data that rarely changes and requires a great deal of processing to acquire. The “On The Hour” caching technique provides a fair tradeoff between the performance of the site and the need for content managers to update information on the site. On heavily used web sites, the locking technique can reduce race conditions and an overload of queries at the top of each hour.
Send comments or questions to robertb@aspalliance.com.