Semantic Blogging and Structured Search
Hot on the heels of Dan’s Linkstacking paper comes Jon Udell’s latest about similar issues. As he puts it:
The next phase of my structured search project is coming to life. For the new version I’m parsing all 200+ of the RSS feeds to which I subscribe, XHTML-izing the content, storing it in Berkeley DB XML, and exposing it to the same kinds of searches I’ve been applying to my own content.
He’s using Tidy to convert the HTML content to XHTML, which is then available to standard XML query and processing tools. Very, very cool!
This is the kind of thing that really needs to be picked up in the scholarly community, something a la the Semantic Blogging Demo, where feeds also contain bibliographic metadata. I’d really like at some point to be able to:
- store content and (bibliographic) metadata in the same XML DB
- export that content and metadata in a variety of forms, from academic papers, to public blog posts
- with respect to blogs, markup a citation and then refer to an ID in a bibliographic record, which is itself embedded as RDF in a feed
Creative Commons License