Requirements: IP*Works! .Net Edition
Podcast Demo: Download
Content Syndication
The RSS and ATOM xml formats allow content to be formatted in standard ways. An application called an aggregator is one that brings many of these syndicated feeds together in a way that makes it possible to read information from many different sources in one place. This would allow you to read what interests you from CNN, Netflix, the local news, knowledgeable people in your field, and any other feed(s) you are interested in reading - but much more quickly and efficiently than if you had to individually visit these websites with a browser.
Podcasting
Podcasting is a recent trend which applies RSS technology to binary media distribution. What is podcasting? It is the process of automatically feeding media (usually audio, but really it could be anything: video, images, etc) to a device. It gets its name from casting these media feeds to iPods. "Broadcasting" to an iPod - Podcasting.
A podcasting application monitors media feeds (Rss, Atom) for content, automatically downloads that content, and then makes that content available for copying to some mobile device (cell phone, iPod, Pocket PC). If this were audio media, it would also be nice to provide the ability to write the media to a CD for later listening (for those without a mobile handheld device).
In this article I'll walk through the basics of writing a simple podcasting application in C#. I'll use the IP*Works! .Net Edition, specifically the RSS component included in it to monitor a media feed and auto-download enclosures. I will cover the basics; that is, I will demonstrate downloading the feeds, parsing the feeds, finding new items, and downloading enclosures in those new items. If you'd like to see the full project, you can download it here.
The application can be expanded to take specific action depending on the enclosure type. For example:
- Audio/Video enclosure types can be automatically copied to a mobile device for later listening/viewing, automatically copied to a CDR/W, or automatically copied to a playlist folder that can be synched with auto playlists in a media player.
- Graphic enclosure types can be automatically be copied to a graphic library folder for later viewing.
- Bit Torrent enclosure types (links) could be downloaded automatically using Bit Torrent (a P2P application designed to "share" network bandwidth). Some podcast producers are using Bit Torrent as a transport format in order to cut the bandwidth burden of serving large audio/video files from their web servers to many consumers.
If I were going to build a simple news aggregator, such as in the "RSS" demo included in the installation of IP*Works!, I would wait for the user to click on a feed, and then download and display the feed items to the user and allow them to select which ones they want to read. When the user clicks on a news item the details of the item would be displayed.
A podcasting application works differently. Instead of requiring the user to click a feed and then downloading the text items and displaying them - the podcasting application will automatically check each feed periodically. If there are new items in the feed that contain binary enclosures, the podcasting application will automatically download these behind the scenes, not bothering the user with this decision. Whenever the user is ready, new items will be available, and optionally automatically copied onto a mobile device of some sort.
We'll be porting this C# application over to Cocoa for MAC OSX users, and I'll post that code when finished. On the MAC you can have iTunes automatically handle writing the files to a connected iPod. I'll also be porting the code to a .Net CF application. These ports will be very easy because of the fact that IP*Works! exists with the same API for so many different platforms, including the Unix and .Net CF Editions.
Downloading Feeds
A news aggregator and a podcasting application both need to allow the user to "subscribe" to different feeds (URLs). The URL of the feed will be to an XML based file, for which which the format will depend on the feed. All podcast feeds will be in RSS 2.0 format (which allows enclosure elements). Non podcast news feeds can be in any of the "standard" formats, like RDF, RSS 1.0, RSS 2.0, or ATOM. These files are parsed by the news aggregating or podcasting application. If you are new to podcasting, here are some interesting podcast feeds:
IT Conversations |
http://www.itconversations.com/rss/recentWithEnclosures.php |
Channel 9 |
http://channel9.msdn.com/rss.aspx |
The Slashdot Review |
http://slashdotreview.com/wp-rss2.php |
The Scripting News |
http://www.scripting.com/rss.xml |
|
|
I'll download these feeds with the RSS component included in IP*Works! The RSS component will automatically parse the raw xml data into the individual items contained in the feed.
Parsing Feeds
The RSS component will parse the feeds for you - so for this step all that is required is to call the GetFeed() method of the RSS component. Before doing so, I'll configure the RSS component to use an IfModifiedSince date in the HTTP request header. This is very important, and tells the web server to only send the feed xml data if it has been modified since the specified date. This helps to preserve server bandwidth, which is valuable since many news aggregators and podcast applications are configured to check for new items on a regular basis. For each feed the original lastreaddate (the first time you retrieve the feed) is empty string. Every time the feed gets downloaded, the lastmodified date gets sent in the response by the server - and saved by the application for later use.
rss1.Config("IfModifiedSince=" + lastreaddate);
rss1.GetFeed(feedurl);
After the component gets the feed XML - it will automatically begin parsing it. The result will be a set of arrays of information about each item in the feed. For example, the titles of the items will be in the ItemTitle[] array. The descriptions of the items will be in the ItemDescription[] array. Other non-required (from the standpoint of RSS 2.0 at least) elements in the items will be accessible through the GetProperty method. This method is provided so that changes in the RSS specification will not result in a broken application. You can get any element you wish using the GetProperty method.
Podcasts items can include enclosure elements that point to the actual binary data being enclosed. This element always includes at least three attributes: url (a url to the data), length (the size of the data), and type (the MIME type of the data, ie "audio/mpeg"). You can use the GetProperty method to get the enclosure element and any attributes by using the syntax:
item/element@attribute
For example, in order to get the url attribute of an enclosure element from the 5th item:
string url = rss1.GetProperty("item[5]/enclosure@url");
Now I can loop through each item in the feed and check for enclosures.
for (int j=1; j<=rss1.ItemCount; j++)
{
string type = rss1.GetProperty("item[" + j.ToString() + "]/enclosure@type");
string url = rss1.GetProperty("item[" + j.ToString() + "]/enclosure@url");
string bytes = rss1.GetProperty("item[" + j.ToString() + "]/enclosure@length");
//if there is an enclosure url, remember it
if ((!url.Equals("")) && (!enclosures.Contains(url))) enclosures.Add(url);
//and if the url is new (never been downloaded before), download it:
if (!url.Equals("") && isNew(feedname, url))
{
totalbytes = Convert.ToDecimal(bytes); DownloadEnclosure(i, url);
}
}
Finding New Items
Note in the code above that the decision to download a particular enclosure is made based on whether or not there is an enclosure URL and whether or not the URL is new. isNew is a boolean function to check a "history" of downloads. If the URL has been downloaded already, there is no need to download it again. You can maintain your history any way you wish, but I'll quickly explain the simplistic way that I've dealt with it.
I've used a set of global arraylists to keep track of several pieces of management data in my application:
- private ArrayList history = new ArrayList();
- private ArrayList enclosures = new ArrayList();
- private ArrayList donotclean = new ArrayList();
- private ArrayList subscriptions = new ArrayList();
- private ArrayList bookmarks = new ArrayList();
The history array list is synched with a data file at run time. This contains a list of all of the enclosure urls that have been downloaded in the past.
The history data file is kept "clean" by removing items that are no longer even available in a particular feed. This keeps the history file from growing indefinitely and keeps it at stable size. I do this by keeping a list of all enclosure urls found in each feed. All of these URLs get saved in the enclosures arraylist. When I save the history file, I compare the array list of all current enclosures (the enclosures arraylist) with the arraylist of the saved history. Anything that is in the history list that is no longer available in the feeds can be deleted from the history. Note that this does make the assumption that all enclosure items have a unique url. The RSS specification does not require a unique identifier for each item, nor a permanent url (permaLink).
The donotclean arraylist is used to list feeds whose items should not be deleted from the history. This is necessary in order to support the IfModifiedSince HTTP request header, which is very important. If the feed has not changed since the IfModifiedSince date, no items will be returned (and so they will not be in the enclosures arraylist), but we do not want to clean them from the history arraylist.
The subscriptions arraylist is used to keep track of all of the feeds the application is currently subscribed to. The subscriptions arraylist also keeps track of the lastmodified date for each feed. This is also synched with a data file at runtime.
My application uses the Windows Media Player ActiveX object to allow listening/viewing of downloaded items. The bookmarks arraylist is used to keep track of any bookmarks that have been saved in the application. I've implemented bookmarks as a way to pickup listening to a particular feed item (audio or video) at the place where you left off at some previous time. The bookmarks are also synched with a data file.
The history, donotclean, and enclosures arraylists form the basis of my structure for determining if an item is new or not. I can just check to see if the enclosure url is contained in the history arraylist. If it is not, I know it is new. The isNew function does this for me:
private bool isNew(string feed, string url)
{
//if its already there, its not new, move on...
if (history.Contains(feed + "|" + url)) return false;
//otherwise, its NOT here, so add it
history.Add(feed + "|" + url);
return true;
}
If the item is new, the isNew function goes ahead and add its to my history arraylist. And when isNew returns with true the DownloadEnclosures function will be called to download the new item.
Downloading Enclosures
The DownloadEnclosure method simply downloads the enclosure. This is done by setting the localfile property of the RSS component to the name of the local file at which the url data should be saved in, and calling the GetURL method to get the url of the Enclosure. I also use the Transfer event, which fires periodically during the transfer process, to implement a progress bar for the download.
private void DownloadEnclosure(string feedname, string url)
{
//the fileizeURL function converts a URL to a suitable filename
filename = fileizeURL(url);
//All downloaded items are saved in a subdirectory of a main download path.
//The subdirectory name is the name of the feed:
System.IO.Directory.CreateDirectory(txtPath.Text + feedname);
rss1.Config("Localfile=" + txtPath.Text + feedname + "\\" + filename);
rss1.GetURL(url);
}
Although podcasting is new at the time of this writing, its use is growing rapidly. Available podcasts to subscribe to are multiplying every day. There are tools available to encourage this growth - aggregators, podcast clients (like the one here), podcast directories, etc. More and better software will enable more comsumers and the technology will continue to grow.
Using the powerful IP*Works! toolkit, you can develop all kinds of innovative and productive applications with relative ease. I hope this sample will prove useful, and encourage expansion and improvements. Please feel free to contact me with any comments, ideas, or improvements that you would like to contribute.