All About Blogs and RSS
- By Richard Koeleman
- Published 08/30/2007
- RSS
- Unrated
Richard Koeleman
Richard Koeleman has an incredible history online, he has always focussed on the development of easy to use applications for small and medium sized companies. Only recently has he decided to start sharing his knowledge, among other places here at http://www.sfolb.com
View all articles by Richard KoelemanQ What is blogging all about?
A First, "blog" is short for Web log. It's a medium in which an author writes a journal-style Web site with provisions for readers to respond. These Web logs are becoming quite valuable in the software community for sharing ideas. Check out blogging on MSDN® at http://blogs.msdn.com.
Q What's the easiest way to set up a Web log?
A The easiest way to set up your own Web log is to go to a site like blogger.com and register as a new user. It provides a Web interface for creating a customized Web log that you can use immediately. There are many other sites like blogging.com that provide support for Web log features.
If you'd like more control over the blogging infrastructure or would like to host your blog on your own server, you can also use one of many blogging applications available today including Radio Userland, Manila, and Movable Type—some of the most popular commercial products. There are also free .NET blogging applications that are easy to use. The most popular are .Text and dasBlog. To set these up, simply download the bits and follow the instructions. You'll be up and running in minutes.
Functionally, both .NET-based applications are fairly equivalent. However, one major difference is that .Text requires a database, either SQL Server™ or MSDE, while dasBlog stores everything in XML files (it's based on the original BlogX framework created by some Microsoft developers). Another difference is that .Text is capable of hosting multiple blogs on a single installation (for example, it's what drives http://blogs.msdn.com today) while dasBlog requires multiple installations. dasBlog has one feature that really stands out called "Mail to Weblog", which allows you to post new entries via e-mail.
The new MSDN blogging site and PDC Bloggers are both good starting places for finding Web logs on any software development topic. Simply browse to one of these sites and read their aggregated feeds. Their feeds will expose you to many individual Web logs and over time you'll naturally find some that you like to read more than others. Then, you can subscribe directly to the individual feeds you enjoy most.
For blogs that specifically cover XML and Web services, check out the list on the MSDN Web Services Developer Center. I personally spend a lot of time on some of these Web logs.
Q What's a feed and how can I subscribe to it?
A A Web log can provide a feed to its content by producing an RSS document available via a well-known URL. An RSS document is an XML file that contains a number of discrete news items, such as entries in a Web log (see Figure 1 for a sample RSS feed). As an XML format, RSS is easily consumed by other programs.
An RSS aggregator is a program that reads RSS documents and displays new items. Most aggregators make it possible to subscribe to a feed by simply entering the URL to the RSS document.
RSS makes reading Web logs easy. Most developers who frequently read Web logs use an aggregator of some sort to help them sift through their subscriptions efficiently. An aggregator makes reading Web logs feel a lot like reading e-mail since they highlight new items and cache items for offline reading (see Figure 2).
There are also some online RSS aggregators that consolidate your RSS subscriptions on a separate Web site. This approach has the advantage of being easy to set up and you can access your subscriptions from any computer. The downside, of course, is that you have to be connected to do any reading.
RSS is ultimately what's made blogging such a powerful new form of communication. Before blogging, most developers spent a lot of time sifting through boring, irrelevant posts in order to discover the rare gems that would occasionally appear from people they respect. Blogging puts readers back in control by allowing them to choose which feeds to read, effectively building their own personalized content streams.
Other types of sites can also take advantage of RSS to syndicate content. For example, most of the major news sites including Wired, CNet, Yahoo!, and NPR News provide RSS feeds. Check out Blogdigger and Syndic8 to find sites that support RSS.
At Microsoft, MSDN provides RSS feeds to syndicate new technical content as it's added to the site. The MSDN Just Published feed is a great way to keep up with new MSDN articles and downloads. Even MSDN Magazine has its own RSS feed! Subscribe to http://msdn.microsoft.com/msdnmag/rss/rss.aspx to receive a monthly update on what's in the current issue. There are many RSS aggegators to choose from today. You can find a fairly complete list at http://blogs.law.harvard.edu/tech/directory/5/aggregators. Some of these are online aggregators while others are desktop applications. Some are free while others charge a fee.
Q Which RSS version is the most current?
A The answer depends on who you ask. There have been several versions of RSS including 0.90, 0.91, 0.92, 0.93, 0.94, 1.0, and 2.0. Making sense of these different versions has been one of the biggest challenges. Understanding them requires a bit of history.
Netscape created the original version of RSS, 0.90, which stood for "RDF Site Summary" or "Rich Site Summary" (the spec says the former was the official name). Netscape invented RSS 0.90 for use in their Web portal activities, but others latched onto the concept and saw more potential uses. Userland Software was one of the first to begin using RSS commercially in their Web log products.
Version 0.90 was heavily based on the W3C's Resource Description Framework (RDF). Many considered the RDF approach overly complex, so a simplified RDF-free version was proposed and labeled 0.91. It was around this time that control of 0.91 passed to Userland Software. Userland Software continued to evolve the simplified spec with several new versions including 0.92, 0.93, and 0.94. To emphasize their focus on simplicity, it wanted RSS to stand for "Really Simple Syndication."
As Userland Software continued with their focus on simplicity, another group of developers resurrected the original RDF version (0.90) because RDF promised them more flexibility. They eventually published RSS 1.0, which officially stands for "RDF Site Summary" again. This version is fundamentally different from those controlled by Userland Software because it uses RDF while the others don't. Userland Software didn't like the fact that RSS 1.0 seemed to displace RSS 0.94, so it shipped a new version and bumped the version number up to 2.0.
And that's where it stands today. The split that occurred left two major competing versions: one that's based on RDF (1.0) and one that isn't (2.0), but they both share the same name. This is terribly confusing since the version numbers lead you to believe that 2.0 is an improvement on 1.0 when in reality they're completely different specifications with different goals. Another group of developers has been working to resolve this confusion once and for all by defining a new syndication specification that breaks free from the RSS name. They're calling it Atom, a project I'll discuss in more detail later in this column.
It doesn't matter much which version you use. Most RSS aggregators support all RSS versions (and some even support Atom) without a glitch. The decision mostly comes down to whether you want to use RDF, which is typically fueled by one's belief in the concept of the Semantic Web.
Q What do RSS 1.0 and 2.0 look like?
A The RSS 1.0 and 2.0 formats contain the same core information, but they're structured differently. I've provided a sample RSS 1.0 document (see Figure 1) and the equivalent RSS 2.0 document (see Figure 2) for you to look over.
You'll notice the differences start right at the top with the root element. In RSS 1.0, the root element is rdf:RDF, and in RSS 2.0 it's rss. The rss element also contains a mandatory version attribute to indicate the precise RSS format in use (possible values include 0.91, 0.94, and so forth). Another major difference is that RSS 1.0 documents are namespace-qualified, while RSS 2.0 documents are not. The information contained in both documents, however, is essentially the same.
Both versions contain channel elements. A channel element contains three required elements: title, description, and link, as illustrated in the following code:
<channel> <title><!-- the channel's title --></title> <description><!-- a brief description --></description> <link><!-- the channel's URL --></link> <!-- optional/extensibility elements go here --> </channel>
In addition to these required elements, RSS 1.0 defines three additional elements: image, items, and textinput, where image and textinput are optional. RSS 2.0, on the other hand, provides 16 additional elements including image, items, and textinput. Examples of these include language, copyright, managingEditor, pubDate, and category. RSS 1.0 allows for making this type of metadata available through extensibility elements defined in separate XML namespaces.
The main structural difference between the two formats has to do with the representation of item, image, and textinput nodes. In RSS 1.0, the channel element contains references to item, image, and textinput nodes that exist outside of the channel itself. This establishes an RDF association between the channel and the referenced node. In Figure 1, the channel element is associated with an image element and two item elements. In RSS 2.0, the item elements are simply serialized in the channel element (see Figure 2).
The item element contains the actual news item information. The structure of item is similar across both versions. The item element usually contains title, link, and description elements, as shown in the following code:
<item> <title><!-- the item's title --></title> <link><!-- the item's URL --></link> <description><!-- a brief description --></description> <!-- optional/extensibility elements go here --> </item>
In RSS 1.0, title and link are required, while description is optional. In RSS 2.0, either title or description must be present; everything else is optional. These are the only item elements defined in RSS 1.0, while RSS 2.0 provides several other optional elements including author, category, comments, enclosure, guid, pubDate, and source. RSS 1.0 makes such metadata available through extensibility elements defined in separate XML namespaces known as RSS modules. For example, in Figure 1 the item's date is represented using the Dublic Core module's <dc:date> element.
Check out the RSS 1.0 and 2.0 specifications for complete details on the different formats.
Q So, what is Atom anyway?
A As I mentioned earlier, Atom is the name of a project for developing a new Web log syndication format to address what many feel are the main problems with RSS today (a soup of confusing version numbers, not a truly open standard, inconsistent, poorly defined, and so on). Atom hopes to offer a clean version that addresses everyone's needs. It is designed to be completely vendor neutral, freely extensible by anybody, and thoroughly specified.
Many of today's blogging engines already support the current Atom syndication format. Figure 3 shows a sample Atom 0.3 feed that is equivalent to the RSS feeds shown in Figure 1 and Figure 2. Notice that the Atom feed is namespace qualified but it doesn't use RDF. This gives Atom something in common with both RSS 1.0 and RSS 2.0. It will be interesting to see how Atom's acceptance plays out in the years to come.
In addition to defining a new syndication format, also hopes to define a standard archiving format and a standard Web log editing API (the Atom API). Check out The Atom Project to peruse the specifications and other Atom resources.
Q What's a blogroll?
A A blogroll is simply a collection of Web log feeds. Most bloggers provide a blogroll on their personal Web log. This allows their readers to connect with others who share similar interests or writing styles. Blogrolls facilitate building networks of respect. A blogroll can be exchanged in XML format using the Outline Processor Markup Language (OPML). Figure 4 shows a sample blogroll.
Most blogging engines will manage blogrolls for you and generate the proper XML format when readers request it. Likewise, most aggregators make it possible to import a blogroll and automatically subscribe to the contained feeds. See http://opml.scripting.com for more information on OPML.
Q Can you explain what referrers, trackbacks, and pingbacks are?
A Most blogging software makes it possible for readers to add comments to a Web log. It's actually more common, however, for readers to add an entry to their own Web log that links back to the original post. Bloggers like to keep track of when this happens so that new readers can follow the entire conversation.
A referrer is an external site from which a user clicked on a hyperlink to reach your site. Many blogging engines will automatically keep track of referrers as readers navigate to an entry on your Web log. Most engines will display the list of referrers at the bottom of the Web log entry so readers can navigate back to the referrer's site and see what they have to say about the entry, based on the assumption that they probably wrote something about it if they linked to it. The problem with referrers has to do with this assumption—there isn't enough information to tell if the referring page actually contains additional relevant information. In fact, spammers have already taken advantage of this loophole to redirect readers for marketing purposes.
Trackback and pingback are similar specifications developed to remedy this situation. Using trackback or pingback, other bloggers can automatically send a ping to your Web log indicating explicitly that they have written an entry that references a specific post. This type of reverse linking allows your Web log to display a list of all entries that have actually commented on your post in a more explict manner. Most of today's blogging software supports all of these techniques. See TrackBack Technical Specification and Pingback 1.0.
Q How can I generate an RSS feed for my Web site?
A Figure 5 illustrates how to generate an RSS 2.0 feed in an .aspx page using an asp:Repeater control. This page assumes that you'll set the control's DataSource property in the codebehind file to the appropriate database resultset.
Q I'd like to aggregate several RSS feeds and display the information on my personal Web site. Can you explain how to do this?
A Since RSS feeds are XML files, accomplishing this is an exercise in using your favorite XML API, such as System.Xml in the Microsoft .NET Framework. Figure 6 contains the code for an ASP.NET Web user control that I wrote to aggregate the RSS feeds listed in a blogroll file (.opml). The code assumes that the opml element will contain a numberToDisplay attribute to indicate how many items from each feed you want to display.

Figure 7 ASP.NET Web User Control
You can drop this control into any .aspx page and it will display items from the various feeds listed in the blogroll. Figure 7 shows this control in action on the Utah .NET User Group Web site.
Q Are there any Web service APIs for interacting with Web logs?
A Many blogging engines provide their own proprietary Web service interface for interacting with a Web log programmatically, but I wouldn't say a standard has emerged yet.
Both .Text and dasBlog provide some .asmx endpoints that provide editing functionality via SOAP, but their interfaces are different. Blogger.com provides an interactive API (Blogger API) based on XML-RPC. Userland Software enhanced the Blogger API and called it the MetaWeblog API. These are probably the most widely recognized Web log APIs today, but still not all Web log engines support them. There is also a separate API for adding comments called the Comment API, but again, it's not universally supported.
The Atom group is currently working to resolve this mess. The Atom API defines a standard Web log API for publishing and editing all Web log content. You can check out their work at The Atom Project.
Source: http://msdn.microsoft.com/msdnmag/issues/04/04/XMLFiles/
