Monday 8 March 2010

Semantic wikis, content management systems and linked data

For some time I've been interested in connections between human readable web pages and machine-processable data on the web, specifically data that can be presented as RDF. Initiatives like RDFa, GRDDL and Microformats offer ways to include both in the same web resource, but don't entirely address the problem of authoring both through a common system or interface.

An early promising approach for addressing this problem was semantic wikis, exemplified by Semantic Media Wiki. For limited purposes, these work really nicely, but marking up data for machine processing often doesn't appear to yield sufficient benefit to justify the additional effort.

I recently attended the 2010 JISC developer meeting, Dev8D (http://dev8d.org/, http://wiki.2010.dev8d.org/w/Main_Page), and one of the topics that lit my fire was Drupal 7. This new release of Drupal (in alpha at the time of writing) brings RDF data into the system's core. I heard that the structure of any content delivered by Drupal can also be exposed as RDF, either as RDFa though the normal web interface, or via a queryable SPARQL endpoint. Wow! All that data accessible as linked data!

So I immediately set about redirecting a nascent project idea, which I'd originally conceived to be based on Semantic MediaWiki, to use Drupal 7 instead. I haven't made a lot of progress, but thinking about the new approach caused me to think about the relative strengths of semantic wikis and "semantic content management systems", which is a description I've just minted to describe systems like Drupal 7.

My original mini-project idea was to collect prose descriptions of technical topics and experts, and use typed links to capture relationships between them, in a way that might conceivably be useful for finding someone of whom to ask a focused technical question. The notion of starting with unconstrained free text was appealing, as relevant structure is not always evident when information is first assembled or recorded. On hearing about RDF support in Drupal 7, I contemplated using it as an alternative way to capture this loosely structured information, anticipating that Drupal's support for linking between nodes would allow the structure to emerge from free-form textual descriptions in much the same way as a semantic wiki.

Since then, I have noticed that a content management system (CMS) imposes a greater degree of uniformity between all object descriptions than a semantic wiki. At heart, the CMS stores information in relational database tables, where each record (row) of a given table follows broadly the same pattern. Of course, different types of record can have very different structures. But, to enter information into such a system, one must first decide what type of object is being described, and this in turn circumscribes the structure of information that can be entered.

In contrast, a semantic wiki element is first and foremost a free text description, within which elements of structure may occasionally be discerned, identified and marked up for semantic analysis. But where there is no such structure, the text may still stand alone, as an unconstrained description free of predetermined structure.

So it seems that semantic wikis and content management systems approach the same goal of combined machine-processable and human-readable information from entirely different directions. The semantic wiki from the origin of unstructured free-format text, within which structure can be discerned and encoded with suitable ad-hoc effort. The CMS from structured data whose individual elements may be unstructured free-form text. The inherently-structured CMS approach makes it easier to capture predetermined structure, while the unstructured wiki approach makes it easier to enter information absent of structure, or whose structure is yet to be determined.

Is there a point where these two approaches meet, combining the advantages of both? I.e., easy entry of unstructured information combined with easy capture or extraction of structure. I don't know, but I have some ideas that might make this possible. Without previously realizing it, I think this is one factor behind my work on Shuffl (http://code.google.com/p/shuffl/), but that project is sill a way from realizing this in a usable fashion.

This is a topic that I shall continue to think about.

No comments:

Post a Comment