Skip to main content


Showing posts from July, 2015

Freely Speaking: What does SciPy have to do with bio-based ideas??!

I was recently asked the above question. And it's a totally valid question as SciPy is somewhat outside of what I usually write about (biological topics, sustainable topics). There is a logical connection though, and it has to do with what I do at work.

Building biological entities is difficult because unlike car design, biological entities like to "misbehave". I say misbehave but it actually only means that we don't sufficiently understand microorganisms well enough to model them perfectly.

This is where data science comes in of course. Through collection of large data sets data science and related fields can help us uncover patterns not seen before. these patterns then help make a better yeast model. Better models = faster product development. Faster product development = faster route to a sustainable business = more products with a positive impact.

So there is a link. Simple, right?

Recap: SciPy 2015 - Synopsis

SciPy 2015 has come and gone. If I step back, what are some of the learning lessons?

There were certain themes that recurred from talk to talk:

Speed. One of the perceived limitations of Python seems to be speed of execution which is important to process very large datasets. Many talks dealt with this topic in various ways. Some of the approaches included enabling process parallelization (Dask, DistArray), GPU-acceleration (VisPy), or acceleration via some means of compilation - sometimes just-in-times (Numba). With these tools, Python is no longer slow. It's impressive that the combination of these approaches has enabled data scientists to process 60+ GB data sets as if they would be loaded into memory on one small laptop that actually only has 8-16 GB of memory.Visualization was a theme. So many talks dealt with making complex data sets visible, and they did so to address different issues: Serialization to enable interactivity (Bokeh, matplotlib), visualizing large dynamic datase…

Recap: SciPy 2015 - Day 3

Jake Van Der Plaas, one of the main contributors of anything from numpy to scipy, scikit-learn, mpld3 etc., gave the key note speech on the third day of the conference talking about the state of scientific computing in Python. As a side note, Jake is a senior scientist and Director Research at the eSciences at University of Washington. I hear that he is currently involved in developing the data analysis pipeline for the LSST as well.

Recap: SciPy 2015 - Day 2

Wes McKinney held the key note speech on the second day. This talk was more of a retrospective, personal journey with a view on the future for python and the greater data science community. Interestingly, some of the tools seem to have started a "long time ago" - 2008. Wes talked about 2011 being the year when Pandas development took off again. Thinking about my own history, I joined Amyris in 2011 as part of the Enzymology department which doesn't feel that long ago. Pandas bug/design fixes, and data wrangling capabilities were implemented from June 2011 to July 2012, which is just 3 months before I joined the software engineering department, and that feels really recent.

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.