A Faceted Browser over SPARQL Endpoints

I have recently been working on (yet another?) faceted browser over RDF data… more precisely RDF data loaded in a SPARQL endpoint that support COUNT and GROUP BY queries. I have successfully used it against Fuseki, Talis platform(tested against http://api.talis.com/stores/bbc-backstage/services/sparql) and Virtuoso (tested against http://dbpedia.org/sparql)

Main characteristics:

  1. Configurable: most aspects of the browser is configurable through two JSON files (configuration.json and facets.json). This includes basic templating & styling ability. To change the style, add a facet or browse a completely different data; just edit the json files accordingly and reload the page
  2. No preprocessing required: as all request are standard SPARQL queries… nothing is required a priori neither on the publisher nor on the consumer end
  3. Facets are defined as triple patterns (see example) therefore facets values don’t need to be necessarily directly associated with the items browsed i.e. they can be a few hops of the browsed items

see the screenshot below to get a feeling of the browser…

If you have used Google Refine before, the resemblance is probably clear. Indeed, I am reusing the JavaScript and CSS code that makes the facets part of Google Refine (they are gratefully under New BSD License.. how much I love open source!!!)

Having it running

The code is shared on github. grab it, deploy it to a Java application server and then play with the configuration files  (a number of examples are provided with the source)

Outlook and //TODO

The most exciting part to me (thanks to the Google Refine inspiration) is that all the needed knowledge about the endpoint and the facets are maintained as JSON on the client side and communicated with the server upon each request. If a user somehow update this configuration and submit to the server, she can customise her view (as an example I added a change button to each facet which allow the user to change only the facet she sees… potentially an add facet button can be added)

Additionally, a list of issues and features to add is on the github repository.

comments/feedback are very warmly welcomed 🙂

 

Update 19/03/2012:  support for fulltext search facets added. Currently supports Virtuoso and standard SPARQL (using regex). See example


Kasabi directory matrix

Kasabi is a recent player in the data marketplace space. What distinguishes Kasabi from other marketplaces (and make it closer to my heart) is that it is based on Linked Data. All the datasets in Kasabi are represented in RDF and provide Linked Data capabilities (with additional set of standard and customised APIs for each dataset… more details).

A recent dataset on Kasabi is the directory of datasets on Kasabi itself. Having worked on related stuff before, especially dcat, I decided to spend this weekend playing with this dataset (not the best plan for a weekend you think hah?!).

To make the long story short, I built a (currently-not-very-helpful) visualization of the distribution of the classes in the datasets which you can see here.

In details:
I queried the SPARQL endpoint for a list of datasets and the classes used in each of them along with their count (the Python code I used is on github, however you need to provided your own Kasabi key and subscribe to the API).
Using Protovis I visualized the data in a matrix. Datasets are sorted alphabetically while classes are sorted descendingly according to the number of datasets they are used in. Clicking on a cell currently shows count of the corresponding dataset,class pair.

Note: I filtered out common classes like rdfs:class, owl:DatatypeProperty, etc… and I also didn’t include classes that appear in only one dataset.

Quick observations:
Not surprisingly, skos:Concept and foaf:Person are the most used classes. In general, the matrix is sparse as most of the datasets are ”focused”. Hampshire dataset, containing various information about Hampshire, uses a large number of classes.

This is still of limited value, but I have my ambitious plan below 🙂
//TODO
1. set the colour hue of each cell according to the corresponding count i.e. entities of the class in the dataset
2. group (and may be colour) datasets based on their category
3. replace classes URIs with curies (using prefix.cc?)
4. when clicking on a cell, show the class structure in the corresponding dataset i.e. what properties are used to describe instances of the class in the corresponding dataset (problem here is that I need to subscribe to the dataset to query it). This can be a good example about smooth transition in RDF from schema to instance data