Dave' Blog: The Web of Data and Experimental Observations

A few things this week got me thinking about the idealized best way to think of experimental data. One was a technical problem I had been pondering. If i wanted to publish some experimental observation (not a paper just a single observation) what is the best way to do this. It got me thinking a little about the Semantic Web (or the Web of Data) and how it could related to 'wet' biology. I am far from an expert on any of these things, so feel free to make public your thoughts

The Web of Data and Ontologies

One of the new things the architechts of the internet have been concerning themselves lately is the semantic web. Some authoritative links are here, here and here. The idea is that there are lots of things out there which are data but the web considers mostly things that are documents. The world will be a better place when computers can make connections between these things. This involves two concepts, one of which is the thing and the other of which is the connection

Things on the Web

Not everything is on the web. I for example, am sitting in my living room and am definately not on the web. Therefore to locate me, I need some kind of identifier. These are called Uniform Resource Identifiers (URI). Mine could be something like http://davebridges.github.com#davebridges. URI's need to be unique and they need to be available on the internet. Anything could have a URI, and something could have several URI's. The key is that a URI should not belong to more than one thing. Things which have multiple URI's can be crossreferenced with specific vocabularies (ie owl:sameAs).

Connections (Ontologies)

Once things are on the internet, the basis of linked data is how these things relate to one another. For example, this blog post was created by me. So if there was some kind of explicit statement connecting this, any computer could figure out that I wrote this post, or inversely that this post was written by me. The connections are defined by specific vocabularies or ontologies. For example dublincore is a vocabulary about documents, and includes a term "creator". Therefore one could create a link between me and this post by writing something like this:

This Blog Post has a Creator named Dave Bridges

The important thing is that the ontology specifically defines the relationship between two URI's. Given this knowledge, a computer could generate the creator of the page, or all pages created by me.

How Would This Work in Science

What got me thinking about this was how it would be great to have defined vocabularies to describe experimental results. For example if there was an ontology that described a protein-protein interaction (there is, its at http://bioportal.bioontology.org/ontologies/39508), one could use, for example two PubMed links as URI's to could indicate a molecular interaction and the two proteins. Given a large enough catalog of these it would be possible to get a list of all molecular interactions for a particular protein.

What About Non-Cannonical Findings

I might talk about this later, but one thing important would be to not just be able to obtain a list of interactions, but also links to the specific data supporting (or refuting that point). Ideally this would go deeper than just a link to the paper, but maybe a link to a separate URI describing a particular experiment.

The things that got me thinking about this were a question i posted on BioStar, a blog post on MolBio Research Highlights and a paper at Nature Preceedings