Monday, October 14, 2013

Freely-Speaking: Biology and Big Data

Computational sciences at the interface of a hard science are gaining in importance as shown by the recent Nobel Price in chemistry awarded this year. I also recently read two more commentaries that appeared in Nature and Nature Biotechnology that I found interesting.

 The first article [1] talked about the need to develop systems that can handle "Big Data" in Biology. Despite the need, a widely-adopted system has not emerged yet. The author credits this to two problems:
  • Biological data exists in a variety of changing formats.
  • Commercial systems may impose steps that are unintuitive to the way data is recorded, the workflow of the scientist or add additional steps for converting data from one format to another - arguably to prevent scientists from using rival systems.
  • A lot of home-grown solutions were not written in a robust way and are badly annotated which makes modifications difficult.
In response to the increasing discrepancy between data generation, and the capturing, storing, annotation, and retrieval of biological information across different systems, the second article [2] was a commentary on interviews that Nature Biotechnology did with  writers of successful computational biology software.

In their summary, the article described that there appears to be a gap in communication between dry and wet-bench biology: Wet-bench biologists often under-appreciate aspects of computation and the craft of software engineering. On the other hand, computational scientists often are oblivious to the need to make their tools more accessible and comprehensible to the wider biology audience.

Both articles seem to suggest that a change in mentality on both sides and closer collaboration between the different sides will be both necessary and helpful in coming up with successful software solutions that address big data problems in biology. These articles describe really well the experiences I have obtained over the last two years myself. And the solutions suggested seem mostly in light with what I think ought to be the solutions. I feel happy to be able to work towards solutions in this space together with my team.

Literature Cited:

[1] Boyle J. "Biology must develop its own big-data systems". Nature 499, 7 (04 July 2013). Last visited: 2013-10-13. Link.
[2] "In need of an updgrade". Nature Biotechnology 31, 837 (2013). Last visited: 2013-10-13. Link.