PyULike vs. SmartFox: Centralized vs. Distributed

CiteULike’s Richard Cameron has posted an interesting outline of a plan to rewrite the code in Python, called for now PyULike. Meanwhile, last week I heard from one of the developers of a really interesting new in-development Firefox plug-in called Firefox Scholar (we are talking about integrating my CSL language for citation processing, as well as import/export formats). Each attempts to solve very real problems for scholars, researchers, and students, but in quite different ways.

The problems are:

  1. How can you best integrate reference management seamlessly into modern web-focused research workflows? As a user, I spend a lot of time working with documents sourced from the web, so why should I then have to open a desktop application and manually enter reference data?
  2. How can one exploit the web and its network effects to allow users to benefit from the social aspects of reference management? It’s really hard to keep up with new work in my own field, let own affiliated ones, so why can’t my reference management solution give me hints once in awhile based on what others with like interests are reading?

Now, how do they solve these problems?

PyULike, like its predecessor, is based on a fully-centralized model. To quote Richard:

Previously I’ve resisted releasing or “open sourcing” code for the site for reasons which I outline on the site’s FAQ. Briefly, these are that I wish to prevent fragmentation of the userbase among a thousand private installations of the CiteULike software…. The benefits of keeping things centralised is that we keep the community effects. Users find others who are reading the same material, and they find papers serendipitously which they wouldn’t otherwise.

Firefox Scholar—aka SmartFox—is based on a slightly different, more distributed, model. Reference data will be stored locally, within Firefox 2.0’s embedded SQLite database. One will be able to extract references from pages one is browsing, or also manually enter and edit references within Firefox.

They then plan to add the ability to sync that data with a centralized server to provide similar sorts of social networking support. Moreover, it will be fully open-sourced, under a GPL license.

OK, but as a user and developer, I’m not so sure I want to be left with such discrete—all-or-nothing—choices. Why couldn’t I, for example, use SmartFox locally in my browser, but have it sync with PyULike’s server? Or more importantly, I don’t accept the notion that a centralized server and social networking are mutual requirements. Can we not allow the sort of vision of these tools but in more distributed fashion? RDF and SPARQL, Atom?

Finally, I’ll reiterate the point I’ve repeatedly made: we need to get this stuff integrated within the desktop and publishing workflow. If I’m using PyUlike or SmartFox (or both) I really need to be able to easily integrate my citations into Word or OpenOffice. MS is already adding the infrastructure to allow this in Word 2007, and we are trying hard to make the same happen at OpenOffice. Until that happens, only part of the puzzle is in place.

So how about some collaborative discussion among these projects so that we can have real interoperability, not only between these projects, but also between them and OpenOffice and Word? Maybe we could even settle on compatible licenses so that we can share code where appropriate.

2 Comments

  1. Hey Bruce,

    I guess maybe I didn’t explain all of what I’ve been working on, but take what you have just described, and add several more features to the mix, and you have the webtop app that is currently only accessible via the PyPod.Net console application > http://pypod.net/console/index.html

    Unfortunately, this is WinXP/2003/Vista|.NET 2.0 only at the moment, but I am working on getting a Mono version running as well, which will then run on the Mac.

    But yeah, you’re right… this is the direction we all need to be headed, using Atom feeds to keep in sync with the information we have the most interest in. Add this + the LLUP/Blip Decentralized Messaging Protocol (see: http://www.x2×2x.org/projects/wiki/doku.php?id=llup for an (OLD!) overview as well as http://www.x2×2x.org/projects/wiki/doku.php?id=llup:spectemplate for the current state of the specification.) and you have a nice mixture of both push and pull decentralized data sharing and messaging services that allows the ability to both access and share anything for cookie recipes, to vacation pics, to code as well as all of the Semantic Web like stuff, but in a MUCH simpler and easier to understand package (in other words, hiding the more difficult, but important stuff like SPARQL/RDF/etc… and implementing a simple inter/intra document messaging format that uses simple REST-styled URI to send and receive messages between particular points inside of documents (this is the AtomicTalk project that I haven’t talked about with anyone other than a fairly restricted group of folks, and don’t really plan to until its ready to be released in Alpha code format)

    The future of data and document programmability is an important focus, and it needs to be as SIMPLE AS POSSIBLE. Thats where my focus is… making all of the above work with a simplifed inter/intra document communication protocol in which allows you to wrap together more complex, yet reusable data queries, and processing algorithm, sending this enclosed package to any other place on the planet that is accessible via a URI, including (and in particular) the internals of a document (e.g. https://foo.org/book/chapter/section/page#footer)

    Or, to think of this in a more real world type scenario, I have a document that exists on another computer somewhere else on the planet that I am collaboratively working on with someone else, and I want to be able to add a reference to the footer of a particular page. That reference exists in a DB somewhere on the planet, and requires a fairly complex SPARQL-based query to extract this information. But since the complex SPARQL query can easily be reused by passing in the data points to use as part of the complex query, I can simply create an enclosure which contains all of the necessary pieces of information, where they can be obtained from (the reusable SPARQL query, the DB in which the information exists, and the data points to be operated upon), and by using the above URI, send this enclosure to the specified ID inside of the specified document, and as long as I have proper permissions set to access this portion of the document, this document will consume this enclosure, and implement the process, caching what is statically cacheable, and storing the dynamic query mechanisms for those pieces that are not.

    @[Message: (package: (destination: https://foo.org/book/chapter/section/page#footer), (dataservice: http://foo.org/data/service/endpoint?{$Stored-SPARQL-Query, $datapoint1, datapoint2}), (credentials: $mypublickey, https://validate.uri.foo/location/to/query/to/determine/if/this/message/really/came/from/who/claims/to/be/the/sender)]

    Early stage look at the syntax, but the similarities between Lisp/Scheme/SmallTalk are completely by accident intended ;)

    NOTE: not sure if WordPress allows the del and ins tag, so if ‘by accident’ does not have a strikethrough, and intended underlined, then just pretend that it does :)

  2. Uggg! None of the intended syntactical “prettyness” came through…

    For now, I guess you will have to use your imagination (or look at some Lisp code ;)


Creative Commons License Creative Commons License