Plugging RDF Content Into Your Website With PHP
Plugging RDF Content Into Your Website With PHP
Table of Contents
Fame And Immense Fortune.............................................................................................................................1
Switching Channels.............................................................................................................................................3
Fresh Meat...........................................................................................................................................................7
Nesting Time......................................................................................................................................................14
Back To Class....................................................................................................................................................18
A Free Lunch.....................................................................................................................................................25
Homework.........................................................................................................................................................30
i
Fame And Immense Fortune
Imagine a site that carried the latest news from the hottest portals on the Web. Stock prices, weather
information, news stories, threaded discussions, software releases...all updated dynamically, on an hourly
basis, without any manual intervention. Imagine the traffic such a site would get. Imagine the advertising
dollars that would pour in, and the adulation that the webmaster would receive.
Now stop imagining and start reading, because, if you pay close attention, that webmaster could be you and
that site could be yours. All you need is a little imagination, some clever PHP coding and a couple of free RSS
files. And, obviously, the remaining nine pages of this article.
RSS (the acronym stands for RDF Site Summary) is a format originally devised by Netscape to distribute
information about the content on its My.Netscape.Com portal. The format has gone through many iterations
since its introduction in early 1997 − take a look at the end of this article for links to information on RSS's
long and complicated history − but seems to have stabilized at RSS 1.0, an RDF−compliant version that is
both lightweight and full−featured.
RSS makes it possible for webmasters to publish and distribute information about what's new and interesting
on a particular site at a particular point in time. This information, which could range from a list of news
articles to stock market data or weather forecasts, is published as a well−formed XML document, and can
therefore be parsed, processed and rendered by any XML parser.
By making it possible to distribute a frequently−updated list of the latest information about a particular Web
site, RSS opens the door to simple, easy content syndication over the Web. In order to understand how,
consider the following simple example:
Site A, a news site (a "content syndicator"), could publish, on a hourly basis, an RSS document containing a
list of the latest news from around the world, together with links to the full article on the site. This RSS
document could be picked up by another Web site (Site B, a "content aggregator"), parsed, and displayed on
Site B's index page. Every time Site A publishes a new RSS document, Site B's index page automatically gets
updated with the latest news.
This kind of arrangement works for both organizations involved in the transaction. Since the links within the
RSS document all point to articles on Site A, Site A will immediately experience an increase in visitors and
site traffic. And Site B's webmaster can take the week off, since he now has a way to automagically update his
index page, simply by linking it to the dynamic content being published by Site A.
Quite a few popular Web sites make an RSS or RDF news feed available to the public at large. Freshmeat,
(http://www.freshmeat.net) and Slashdot (http://www.slashdot.org) both have one, and so do many others. I'll
be using Freshmeat's RDF file extensively over the course of this article; however, the techniques described
here can be applied to any RSS 1.0 or RDF file.
<channel rdf:about="http://www.melonfire.com/">
<title>Trog</title>
<description>Well−written technical articles and
tutorials on Web technologies</description>
<link>http://www.melonfire.com/community/columns/trog/</link>
<items>
<rdf:Seq>
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=100" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=71" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=62" />
</rdf:Seq>
</items>
</channel>
<item
rdf:about="http://www.melonfire.com/community/columns/trog/article.php?i
d=10
0">
<title>Building A PHP−Based Mail Client (part 1)</title>
<link>http://www.melonfire.com/community/columns/trog/article.php?id=100
</li
nk>
<description>Ever wondered how Web−based mail clients
work? Find out here.</description>
</item>
Switching Channels 3
Plugging RDF Content Into Your Web Site With PHP
<item
rdf:about="http://www.melonfire.com/community/columns/trog/article.php?i
d=71">
<title>Using PHP With XML (part 1)</title>
<link>http://www.melonfire.com/community/columns/trog/article.php?id=71<
/link>
<description>Use PHP's SAX parser to parse XML data and
generate HTML pages.</description>
</item>
<item
rdf:about="http://www.melonfire.com/community/columns/trog/article.php?i
d=62">
<title>Access Granted</title>
<link>http://www.melonfire.com/community/columns/trog/article.php?id=62<
/link>
<description>Precisely control access to information
with the mySQL grant tables.</description>
</item>
</rdf:RDF>
As you can see, an RDF file is split up into very clearly−demarcated sections. First comes the document
prolog,
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22−rdf−syntax−ns#"
xmlns="http://purl.org/rss/1.0/">
This is followed by a <channel> block, which contains general information on the channel that is described by
this RDF file. In the example above, the channel is Melonfire's Trog column, which gets updated every week
with new technical articles and tutorials.
<channel rdf:about="http://www.melonfire.com/">
<title>Trog</title>
<description>Well−written technical articles and
tutorials on Web technologies</description>
Switching Channels 4
Plugging RDF Content Into Your Web Site With PHP
<link>http://www.melonfire.com/community/columns/trog/</link>
<items>
<rdf:Seq>
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=100" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=71" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=62" />
</rdf:Seq>
</items>
</channel>
The <channel> block contains an <items> block, which contains a sequential list of all the resources described
within the RDF document. This list is formulated as a series of <li /> elements, which may be familiar to you
from HTML. Every resource in this block corresponds to a resource described in greater detail in a subsequent
<item> block (further down).
<items>
<rdf:Seq>
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=100" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=71" />
<li
rdf:resource="http://www.melonfire.com/community/columns/trog/article.ph
p?id
=62" />
</rdf:Seq>
</items>
It's also possible to intersperse an <image> block at this stage, in case you'd like to publish the URL of your
channel's logo.
And now for the meat. Every <item> block in an RSS 1.0 document describes a single resource in greater
Switching Channels 5
Plugging RDF Content Into Your Web Site With PHP
detail, providing a title, a URL and a description of that resource.
<item
rdf:about="http://www.melonfire.com/community/columns/trog/article.php?i
d=71">
<title>Using PHP With XML (part 1)</title>
<link>http://www.melonfire.com/community/columns/trog/article.php?id=71<
/link>
<description>Use PHP's SAX parser to parse XML data and
generate HTML pages.</description>
</item>
In this example, the <item> block above describes a single article in the Trog "channel", providing a
description and title for it, and including a URL so that a content aggregator can create backward links to it.
As you can see, an RSS 1.0 file is fairly straightforward, and it's quite easy to create one, either manually or
programmatically. The example and explanation above was merely illustrative; there's a lot more you can do
with RSS 1.0 and RDF in general, and you should take a look at the links provided at the end of this article for
more information. Until we get there, though, let's spend a few more minutes discussing how an RSS 1.0
document like the one above can be plugged into your own Web site.
Switching Channels 6
Fresh Meat
Since RSS is, technically, well−formed XML, it can be processed using standard XML programming
techniques. There are two primary techniques here: SAX (the Simple API for XML) and DOM (the Document
Object Model).
A SAX parser works by traversing an XML document and calling specific functions as it encounters different
types of tags. So, for example, I might call a specific function to process a starting tag, another function to
process an ending tag, and a third function to process the data between them. The parser's responsibility is
simply to sequentially traverse the document; the functions it calls are responsible for processing the tags
found. Once the tag is processed, the parser moves on to the next element in the document, and the process
repeats itself.
A DOM parser, on the other hand, works by reading the entire XML document into memory, converting it
into a hierarchical tree structure, and then providing an API to access different tree nodes (and the content
attached to each). Recursive processing, together with API functions that allow developers to distinguish
between different types of nodes (element, attribute, character data, comment et al), make it possible to
perform different actions depending on both node type and node depth within the tree.
SAX and DOM parsers are available for almost every language, including your favourite and mine, PHP. I'll
be using PHP's SAX parser to process the RDF examples in this article; however, it's just as easy to perform
equivalent functions using the DOM parser.
Let's look at a simple example to put all this in perspective. Here's the RDF file I'll be using, culled directly
from http://www.freshmeat.net/ :
Fresh Meat 7
Plugging RDF Content Into Your Web Site With PHP
<dc:date>2002−02−11T10:20+00:00</dc:date>
<items>
<rdf:Seq>
<rdf:li rdf:resource="http://freshmeat.net/releases/69583/" />
<rdf:li rdf:resource="http://freshmeat.net/releases/69581/" />
</rdf:Seq>
</items>
<image rdf:resource="http://freshmeat.net/img/fmII−button.gif"
/>
<textinput rdf:resource="http://freshmeat.net/search/" />
</channel>
<image rdf:about="http://freshmeat.net/img/fmII−button.gif">
<title>freshmeat.net</title>
<url>http://freshmeat.net/img/fmII−button.gif</url>
<link>http://freshmeat.net/</link>
</image>
<item rdf:about="http://freshmeat.net/releases/69583/">
<title>sloop.splitter 0.2.1</title>
<link>http://freshmeat.net/releases/69583/</link>
<description>A real time sound effects program.</description>
<dc:date>2002−02−11T04:52−06:00</dc:date>
</item>
<item rdf:about="http://freshmeat.net/releases/69581/">
<title>apacompile 1.9.9</title>
<link>http://freshmeat.net/releases/69581/</link>
<description>A full−featured Apache compilation
HOWTO.</description>
<dc:date>2002−02−11T04:52−06:00</dc:date>
</item>
</rdf:RDF>
And here's the PHP script to parse it and display the data within it:
<?php
// XML file
$file = "fm−releases.rdf";
Fresh Meat 8
Plugging RDF Content Into Your Web Site With PHP
$currentTag = "";
$flag = "";
// create parser
$xp = xml_parser_create();
// parse data
while ($xml = fread($fp, 4096))
{
if (!xml_parse($xp, $xml, feof($fp)))
{
die("XML parser error: " .
xml_error_string(xml_get_error_code($xp)));
}
}
// destroy parser
xml_parser_free($xp);
Fresh Meat 9
Plugging RDF Content Into Your Web Site With PHP
if ($name == "ITEM")
{
echo "<hr>";
$flag = 0;
}
}
?>
Fresh Meat 10
Capture The Flag
The first step in this script is to set up some global variables:
// XML file
$file = "fm−releases.rdf";
The $currentTag variable will hold the name of the element that the parser is currently processing − you'll see
why this is needed shortly.
Since my ultimate goal is to display the individual items in the channel, with links, I also need to know when
the parser has exited the <channel>...</channel> block and entered the <item>...</item> sections of the
document. Since I'm using a SAX parser, which operates in a sequential manner, there is no parser API
available to discover depth or location in the tree. So I have to invent my own mechanism to do this − which
is where the $flag variable comes in.
The $flag variable will be used to find out if the parser is within the <channel> block or the <item> block.
The next step is to initialize the SAX parser and begin parsing the RSS document.
// create parser
$xp = xml_parser_create();
// parse data
while ($xml = fread($fp, 4096))
{
if (!xml_parse($xp, $xml, feof($fp)))
{
die("XML parser error: " .
xml_error_string(xml_get_error_code($xp)));
// destroy parser
xml_parser_free($xp);
This is a fairly standard sequence of commands, and the comments should explain it sufficiently. The
xml_parser_create() function is used to instantiate the parser, and assign it to the handle $xp. Next, callback
functions are set up to handle opening and closing tags, and the character data within them. Finally, the
xml_parse() function, in conjunction with a bunch of fread() calls, is used to read the RDF file and parse it.
Each time an opening tag is encountered in the document, the opening tag handler elementBegin() is called.
This function receives, as function argument, the name of the current tag and its attributes (if any). This tag
name is assigned to the global $currentTag variable, and − if the tag is an opening <item> tag − the $flag
variable is set to 1.
Conversely, when a closing tag is found, the closing tag handler elementEnd() is invoked.
Now, what about the data between the tags, which is what we're really interested in? Say hello to the character
data handler, characterData().
Now, if you look at the arguments passed to this function, you'll see that characterData() only receives the
data between the opening and closing tag − it has no idea which particular tag the parser is currently
processing. Which is why we needed the global $currentTag variable in the first place (told you this would
make sense eventually!)
If the value of $flag is 1 − in other words, if the parser is currently within an <item>...</item> block − and if
the element currently being processed is either a <title>, <link> or <description> element, then the data is
printed to the output device (in this case, the Web browser), followed by a line break.
The entire RDF document will be processed in this sequential manner, with output appearing every time an
item is found. Here's what you'll see when you run the script:
<html>
<head>
<basefont face="Verdana">
</head>
<body>
<?php
// XML file
$file = "http://www.freshmeat.net/backend/fm−releases.rdf";
Nesting Time 14
Plugging RDF Content Into Your Web Site With PHP
$flag = 1;
}
else if ($name == "CHANNEL")
{
$flag = 2;
}
}
Nesting Time 15
Plugging RDF Content Into Your Web Site With PHP
// create parser
$xp = xml_parser_create();
// parse data
while ($xml = fread($fp, 4096))
{
if (!xml_parse($xp, $xml, feof($fp)))
{
die("XML parser error: " .
xml_error_string(xml_get_error_code($xp)));
}
}
// destroy parser
xml_parser_free($xp);
?>
</table>
</body>
</html>
The major difference between this iteration and the previous one is that this one creates two arrays to hold the
information extracted during the parsing process. The $channel array is an associative array which holds basic
descriptive information about the channel being processed, while the $items array is a two−dimensional array
containing information on the individual channel items. Every element of the $items array is itself an
associative array, with named keys for the item title, URL and description; the total number of elements in
$items is equal to the total number of <item> blocks in the RDF document.
Nesting Time 16
Plugging RDF Content Into Your Web Site With PHP
Note also the change to the $flag variable, which now holds two values depending on whether a
<channel>...</channel> or <item>...</item> block is being processed. This is necessary so that the parser puts
the information into the appropriate array.
Once the document has been fully parsed, it's a simple matter to iterate through the $items array and print
each item as a table row. Here's what the end result looks like:
Nesting Time 17
Back To Class
Now, with all this power at your command, why on earth would you want to restrict yourself to just a single
RDF feed? As I said earlier, most major sites make snapshots of their content available on a regular basis...and
it's fairly simple to plug all these different feeds into your own site. Let's see how.
The first thing to do is modularize the code from the previous example, so that you don't need to repeat it over
and over again for every single feed. I'm going to simplify things by packaging it into a class, and including
the class file into my final PHP script.
<?
class RDFParser
{
//
// variables
//
//
// methods
//
Back To Class 18
Plugging RDF Content Into Your Web Site With PHP
// parse data
while ($xml = fread($fp, 4096))
{
if (!xml_parse($this−>xp, $xml, feof($fp)))
{
die("XML parser error: " .
xml_error_string(xml_get_error_code($this−>xp)));
}
}
// destroy parser
xml_parser_free($this−>xp);
}
Back To Class 19
Plugging RDF Content Into Your Web Site With PHP
$this−>flag = 1;
}
else if ($name == "CHANNEL")
{
$this−>flag = 2;
}
}
$this−>items[$this−>count][strtolower($this−>currentTag)] .=
$data;
}
else if ($this−>flag == 2)
{
$this−>channel[strtolower($this−>currentTag)] .= $data;
}
}
}
Back To Class 20
Plugging RDF Content Into Your Web Site With PHP
function getChannelInfo()
{
return $this−>channel;
}
}
?>
If you're familiar with PHP classes, this should be fairly simple to understand. If you're not, flip forward to the
end of this article for a link to a great article on how classes work, and then read the listing above again.
Before we proceed to use this class, I'm going to take a few minutes out to point out one line from it − the call
to xml_set_object() above.
Now, how about using this class to actually generate a Web page with multiple content feeds?
<?
include("class.RDFParser.php");
// how many items to display in each channel
$maxItems = 5;
?>
<html>
<head>
<basefont face="Verdana">
<body>
Back To Class 21
Plugging RDF Content Into Your Web Site With PHP
?>
The latest from <a href=<? echo $f_channel["link"]; ?>><? echo
$f_channel["title"]; ?></a> <br> <ul> <? // iterate through
items array
for ($x=0; $x<$maxItems; $x++) {
if (is_array($f_items[$x]))
{
// print data
$item = $f_items[$x];
echo "<li><a href=" . $item["link"] . ">" .
$item["title"] . "</a>";
}
}
?>
</ul>
</font>
</td>
Back To Class 22
Plugging RDF Content Into Your Web Site With PHP
</font>
</td>
</tr>
</table>
</body>
</head>
</html>
This is fairly simple. Once an instance of the class has been instantiated with the "new" keyword,
$f = new RDFParser();
class methods can be used to set the location of the RDF file to be parsed
$f−>setResource("http://www.freshmeat.net/backend/fm−releases.rdf");
$f−>parseResource();
and to retrieve the $channel and $items arrays for further processing.
<?
$f_channel = $f−>getChannelInfo();
$f_items = $f−>getItems();
?>
Back To Class 23
Plugging RDF Content Into Your Web Site With PHP
Each time you reload the script above, the appropriate RDF files will be fetched from the specified locations,
parsed and displayed in the format required.
If you have a high−traffic site, you might find this constant "pull" to be more of a nuisance than anything else,
especially if the RDF data in question is not updated all that often. In that case, it might be wise to explore the
option of caching the RDF data locally, either by extending the example above to include caching features, or
by using a cron job to download a local copy of the latest RDF file to your Web server every couple hours,
and using the local copy instead of the "live" one.
Back To Class 24
A Free Lunch
Now, the class I've written above is meant primarily for illustrative purposes, and perhaps for low−traffic
sites. If you're looking for something more professional, there are a number of open−source RDF parsers out
there which bring additional features (including caching) to the table. Let's look at some examples of how
they can be used.
The first of these is the RDF parser class, developed by Stefan Saasen for fase4 and available for free
download from http://www.fase4.com/rdf/. This is a very full−featured RDF parser, and includes support for
caching and authentication via proxy. Here's an example of how it might be used:
<html>
<head>
<style type="text/css">
body {font−family: Verdana; font−size: 11px;}
.fase4_rdf {font−size: 13px; font−family: Verdana}
.fase4_rdf_title
{font−size: 13px; font−weight : bolder;}
</style>
</head>
<body>
<?
// include class
include("rdf.class.php");
// instantiate object
$rdf = new fase4_rdf;
An option to this is the PHP RDF parser class developed by Jason Williams, available at
http://www.nerdzine.net/php_rdf/ . This is a bare−bones PHP class, which exposes a few basic methods, but a
large number of properties that you can manipulate to format the processed data to your satisfaction.
A Free Lunch 25
Plugging RDF Content Into Your Web Site With PHP
<html>
<head>
<basefont face="Verdana">
</head>
<body link="Red" vlink="Red" alink="Red">
<?
include("rdf_class.php");
Documentation for both these classes is available on their respective Web sites.
A Free Lunch 26
Adding A Little Style
In case you don't like the thought of iterating through all those PHP arrays and marking them up with HTML,
you also have the option to format and display the data using an XSLT stylesheet. PHP 4.1 comes with
support for the Sablotron XSLT processor via a new XSLT API, which can be used to combine an XSLT
stylesheet with an XML (or, in this case, RDF) document to easily transform XML markup into
browser−readable HTML.
I'm not going to get into the details of this − take a look at the PHP manual, or at the links at the end of the
article, for detailed information − but I will demonstrate what I mean with a simple example. First, here's the
stylesheet:
<?xml version="1.0"?>
<xsl:stylesheet
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:rdf="http://www.w3.org/1999/02/22−rdf−syntax−ns#"
xmlns:rss="http://purl.org/rss/1.0/"
xmlns:dc="http://purl.org/dc/elements/1.1/" version="1.0">
</xsl:stylesheet>
And here's the PHP script that combines the stylesheet above with the Freshmeat RDF document described
previously to generate an HTML page:
<?php
// XML file
// this needs to be a local file
$xml = "fm−releases.rdf";
// XSLT file
$xslt = "fm.xsl";
// clean up
xslt_free($xp);
?>
Fairly simple and self−explanatory again, I think. The two documents are merged to produce the following
composite output:
This is an alternative, and perhaps simpler (though not all that optimal), method of turning RDF data
browser−readable HTML. Note, however, that you will need to have an external program running (probably
via cron) to update your local copy of the RDF file on a regular basis, since the PHP XSLT processor may
have trouble accessing a remote file.
Till next time...stay healthy! Note: All examples in this article have been tested on Linux/i386 with Apache
1.3.12 and PHP 4.1.1. Examples are illustrative only, and are not meant for a production environment.
Melonfire provides no warranties or support for the source code described in this article. YMMV!
Homework 30