Use Write Caching to Optimize High Volume Data Driven Applications: ASP Alliance

Print Add To Favorites Email To Friend Rate This Article
Use Write Caching to Optimize High Volume Data Driven Applications

page 7 of 11

by Steven Smith
Feedback

Average Rating: This article has not yet been rated.
Views (Total / Last 10 Days): 62806/ 88

Article Contents:

Persisting Update Data to the Database

At some point, data from the in-memory cache has to be written to the database. Typically, this is done periodically using a fixed time interval, but it could also be triggered manually or by other events, such as whenever a counter reaches a certain threshold. The greater the time between updates, the more expensive the update operation will be and the more risk of data loss is incurred. Thus, it is probably best to perform the updates regularly using a relatively short interval. For my tests, I used an interval of 5 seconds.

Actually performing the writes can be done in any number of ways. I only implemented one technique for my scenario, but the others are worth noting and may offer advantages in your situation.

Pure ADO.NET – loop through and call appropriate stored procedures (or parameterized SQL statements) to perform updates.
ADO.NET DataTable/Set – build a DataTable from the DB Schema, load it with data, and re-synchronize. (The in-memory cache could even use a DataTable instead of a Hashtable if this approach is used.)
Munge all of the data into a comma-delimited string and pass it to a custom stored procedure that will parse the string and perform all of the necessary inserts and updates.
Serialize all of the data into an XML document and pass it to a stored procedure that will then parse the XML into a table structure using sp_xml_preparedocument.

I’m sure there are other options as well, but these were the ones I considered. I know that options A and B both require at least one database round trip per data item, so I avoided them immediately. Option C seems like a bit of a hack, although I know its performance can easily exceed that of XML. It results in only one database round trip per batch update, but requires writing (or more likely, finding via Google) some custom string parsing SQL. Option D eliminates the need for any custom code, requires only a single round trip to the database, and uses standard XML rather than some custom string format. Although it might be a little slower, I’ve heard of this XML thingy, and I think it might catch on, so I went with option D. As mentioned, I may cover Option E in another article.

One consideration for the technique I’m using is the possibility that some data may be lost. Storing the counters in memory means that a machine reboot or application reset or update will cause the cached data to be lost. In my scenario, this is acceptable to me, since my cache interval is only 5 seconds, and application resets are a rare occurrence.

Serializing the Collection Data to XML

I haven’t yet gotten to the stored procedure side of things, but I know that I’m going to end up passing it a big string filled with XML-formatted data. So, I need an XML schema and a way to convert my data into this XML format. Figure 5 shows an example of the XML schema that I came up with (heavily borrowed from friend Keith Barrows):

Figure 5: Sample XML Sent to Stored Procedure

<ROOT>
   <Activity id='234' date='12/12/2005' impressions='23344' clicks='2344'  />
   <Activity id='1' date='12/13/2005' impressions='2356' clicks='53'  />
</ROOT>

To produce the XML, I simply overrode the ToString method (since I wasn’t using it for anything else) on my Activity class to return the contents of the class formatted as an <Activity /> element. Another option would have been to use XML Serialization for the Activity class. I opted to go the ToString route because it was simpler and, I believe, better performing. Then, in the ActivityCollection class, I overrode ToString once more, to include all members of the collection as strings, wrapped with <ROOT> and </ROOT>. Figure 6 shows the code for Activity, and Figure 7 shows ActivityCollection.

Figure 6: Serializing Individual Activity Items to XML

public override string ToString()
{
 System.Text.StringBuilder sb = new System.Text.StringBuilder(100);
 sb.Append("<Activity id='");
 sb.Append(this.id.ToString());
 sb.Append("' date='");
 sb.Append(this.activityDate.ToShortDateString());
 sb.Append("' impressions='");
 sb.Append(this.impressions.ToString());
 sb.Append("' clicks='");
 sb.Append(this.clicks.ToString());
 sb.Append("' />");
 return sb.ToString();
}

Figure 7: Serializing the ActivityCollection to XML

public override string ToString()
{
 System.Text.StringBuilder sb = new System.Text.StringBuilder(this.Count * 100);
 sb.Append("<ROOT>\n");
 foreach(FeaturedItemActivity item in this.Values)
 {
  sb.Append(item.ToString());
 }
 sb.Append("\n</ROOT>");
 return sb.ToString();
}

Note that both of these methods use StringBuilder classes, rather than string concatenation. In the case of Activity, string concatenation would work about as quickly as using StringBuilder (I tested it). Feel free to go either way. However, ActivityCollection really should use the StringBuilder class for its ToString method, since there is a potentially large (and certainly unknown) number of concatenations to be made. That could easily hurt performance. (For more on StringBuilder vs. String concatenation, see Improving String Handling Performance in .NET Framework Applications.)

All that remains is to write a stored procedure that will take this XML string, parse it, and INSERT or UPDATE the contents as required. Figure 8 shows just such a procedure. It uses the sp_xml_preparedocument and sp_xml_removedocument procedures to parse the XML string into a table structure (using OPENXML). The data can then be used essentially as a table. By doing the UPDATE first for keys that are in the existing table and then doing an INSERT for keys that are not in the existing table, we ensure that we do not perform any double UPDATE or INSERT operations. One possible optimization would be to place the contents of the XML document into a temporary table to avoid the cost of calling OPENXML twice. However, I haven’t researched this enough to know if it would make much of a difference (feel free to comment).

Figure 8: Using sp_xml_preparedocument To Create a Bulk Insert Stored Procedure

CREATE PROCEDURE ads_FeaturedItemActivity_BulkInsert 
  @@doc text -- XML Doc...
AS
DECLARE @idoc   int
-- Create an internal representation (virtual table) of the XML document...
EXEC sp_xml_preparedocument @idoc OUTPUT, @@doc
-- Perform UPDATES
UPDATE ads_FeaturedItemActivity
SET ads_FeaturedItemActivity.impressions = ads_FeaturedItemActivity.impressions + ox2.impressions,
    ads_FeaturedItemActivity.clicks = ads_FeaturedItemActivity.clicks + ox2.clicks
FROM OPENXML (@idoc, '/ROOT/Activity',1)
          WITH ( [id]  int
        , [date]  datetime
        , impressions int
        , clicks  int
        ) ox2
WHERE ads_FeaturedItemActivity.FeaturedItemId = [id]
AND ads_FeaturedItemActivity.ActivityDate = [date]
 
-- Perform INSERTS
INSERT INTO ads_FeaturedItemActivity
     ( FeaturedItemId
     , ActivityDate
     , Impressions
     , Clicks
     )
SELECT [id]
     , [date]
     , impressions
     , clicks
  FROM OPENXML (@idoc, '/ROOT/Activity',1)
          WITH ( [id]  int
        , [date]  datetime
        , impressions int
        , clicks  int
        ) ox
  WHERE NOT EXISTS 
(SELECT [id] FROM ads_FeaturedItemActivity 
WHERE FeaturedItemId = [id] AND ActivityDate = [date])
-- Remove the 'virtual table' now...
EXEC sp_xml_removedocument @idoc
GO

You can see another example of this technique here: SQLXML.org – How to Insert and Update with OpenXML. Now that we have a way to create the XML and consume it, we just have to figure out how to send the updates to the database asynchronously, rather than on every request.

« (Page 6)

View Entire Article

(Page 8) »

User Comments

Title: Modern SQL XML Support
Name: Steve Smith
Date: 2008-08-08 2:33:41 PM
Comment:
Jason Follas posted recently on how to do the SQL XML work more efficiently in SQL 2005+:
http://jasonfollas.com/blog/archive/2008/06/19/coding-in-sql-server-an-evolution.aspx

Title: MSMQ
Name: Derek
Date: 2008-01-14 9:37:07 AM
Comment:
Thanks for the article. Having some experience with MSMQ I can tell you it will probably perform well, but we've had a lot of issues with using it in a clustered environment. This is a well-known shortcoming of MSMQ.

We've gone to some 3rd party pub-sub solutions that work much more efficiently and work well in a clustered environment.

Title: Using a synchronized HashTable
Name: Tim
Date: 2005-08-22 1:41:25 PM
Comment:
Also, how would this be affected by using Hashtable.Synchronized(...) to create the hashtable. It appears that automatically protects the hashtable's Add method and the indexing property Set accessor with a lock, which would protect you during writes.

Title: What about the code for updating the item in Cache?
Name: Tim
Date: 2005-08-22 12:08:14 PM
Comment:
Can you show sample code for updating the item in Cache? For example, if item is added to cache with certain expiration, etc., when you update it, do you have to add it back with the same parameters, or do you just reference it like Cache[myitem] = updatedValue, and it will retain all previously set values...

Thanks

Title: Programmer
Name: Microfolx
Date: 2005-05-21 10:33:19 AM
Comment:
This is a very nice site. Keep it up.

Title: Re: Consistency
Name: Brett
Date: 2005-05-20 5:15:54 PM
Comment:
Nice article... I might consider trying normal XML writers rather than StringBuilder. Or even XML serialization. Probably slightly slower, but reduces errors.

On the point regarding charging for page views. I have two suggestions. One is to periodically reload the current click count from the database and not worry about being exact. Hey if they get 1 million or 1,005,000, does it really matter? This isn't the space shuttle. The second would be to switch to a non-caching code path when you get close to whatever your maximum value is. So when you hit, say 990,000 views (based on your hit rate, etc.) then switch to the non-caching version for that customer.

Title: Consistency
Name: Brian
Date: 2005-05-20 2:03:29 PM
Comment:
I had to accomplish this exact same scenario. The problem I ran into was the caching. Lets say that one of your advertisers pays you to show their ad 1 million times. How can you insure that it doesn’t go over? If you are holding the most current counters in cache, won’t each process be looking at their own cache, not to mention each server in the web farm and all it’s processes be looking at their own cache.

Another general Cache question. When I write something to cache, isn’t it only available for that particular process? So if you setup your application to have 5 worker processes, won’t that have 5 separate cache entries?

Title: web farm
Name: parker
Date: 2005-04-24 3:19:08 AM
Comment:
How do you account for disperate sessions in a web farm scenario? Is the cache shared amoung the servers?

Title: Too Good
Name: Zulfiqar Damani
Date: 2005-04-18 12:46:36 AM
Comment:
It presented very well, also it was interesting too.

Title: Thanks
Name: Happel
Date: 2005-04-14 8:23:23 AM
Comment:
Thanks for sharing this article, very interesting.