Home > DiscoveryConnections - Our Blog > Vocabulary Standards

Vocabulary Standards

Posted by leeharland on November 8, 2011

On behalf of the authors, i'm pleased to be able to say that our paper, Empowering industrial research with shared biomedical vocabularies is now out. In the article, a bunch of us mainly from the pharma industry (but also academia/public science and publishers) have tried to define a new approach to solving a problem thats been around for years. Our argument is central to precompetitive thinking - that you dont know who you'll be partnering tomorrow, so only open, shared standards will stop you having to undergo laborious integration efforts every time. (more below)

We had comments back from a large number (6 i think) of reviewers. Apart from the usual typos, innacuracies and general bad grammar, there were some really positive reactions and a number of equally tough criticisms. For the latter, a few of things have really stuck in my mind. But probably the most important, was the challenge 'does this really make financial sense?' I think thats an excellent question, but one thats very difficult to answer. Of course, one can add up the hours spent creating/curating vocabularies in-house and argue sharing this helps etc etc, (which we sort of did). But does this really answer the question?

Quantifying the impact of data integration and eBiology within pharma research is notoriously difficult. I remember software providers coming to us with phrases such as "this tool saves each scientist 5 minutes a day, so thats $xxx millions per year in saved time", which of course, cuts no mustard with anyone. Even when you can demonstrate direct project impact, its still incredibly hard to tie that to tangible numbers. A scientist once told me that the text-mining infrastructure we used to power our target landscape system helped them find a paper on a particular cytokine they would have completely missed. And, this was a really important paper - helping to connect the cytokine to a new piece of biology (to them). Thats about as good a thumbs up for some bioinformatics infrastruture you can get! But, as to the value - who knows, it'd be nigh on impossible to trace the impact of that particular paper on the production of a drug, or even a candidate molecule. Its just a small part of the overall process. I recall that project was terminated (not as a result of the paper, more to do with potential market size). So, does that make our infrastucture irrelevant, or fabulous, even though it didnt increase bottom line productivity?

I would choose to judge impact on a more qualitative (i.e. woolly) metrics - what does this enable. Does it help scientists do more, find more. I strongly believe that better standards do this, it might not be immediately obvious, but if we're going to make sense of multiple layers of information, it needs to be integrated in a meaningful way. But - there is a cost to doing this, and is this cost worth what we get back? I look to examples such as the Gene Ontology or MedDRA where shared effort has resulted in incredibly powerful resources. Yet, we dont know the answer to the question we were asked, and the only way to find out is to press forward. Its likely we'll try a bunch of stuff, and yes, some will be a waste of money. But if companies can partner with each other and public science to create resources with anything like the impact of GO, MedDRA et al, then this would be a great success.

There's more discussion on this topic over at the Pistoia Vocabulary Standards site, if you're interested in whats going on, pop over there and contact us if you want to know more....