Skip to main content

Recap: SciPy 2015 - Day 2

Wes McKinney held the key note speech on the second day. This talk was more of a retrospective, personal journey with a view on the future for python and the greater data science community. Interestingly, some of the tools seem to have started a "long time ago" - 2008. Wes talked about 2011 being the year when Pandas development took off again. Thinking about my own history, I joined Amyris in 2011 as part of the Enzymology department which doesn't feel that long ago. Pandas bug/design fixes, and data wrangling capabilities were implemented from June 2011 to July 2012, which is just 3 months before I joined the software engineering department, and that feels really recent.




Phillip Cloud gave a talk on Blaze and Odo. Blaze is an interface for data-centric computation. It consists of expressions and compute recipes and follows similar design principles as can be found in dplyr or SQLAlchemy. Blaze expressions describe table. Compute recipes contain the correct implementation for the correct backend.Odo is a library for doing set conversion.



Stephan Hoyer talked about xray. Xray was motivated by the need to work on large, multi-dimensional scientific data with lots of labels/structure coming from climate science. It's pandas like but for multi-dimensions. The main data structures are DataArray (like pandas.Series) and Dataset (like pandas.DataFrame). Xray works with Dask.




In the afternoon, I attended 2 Computational Biology talks.

Jai Ram Rideout and Evan Boylen introduced Scikit-bio a new bioinformatics library currently in beta development. Being a "scikit" package, this package is designed to work with other current python ecosystem modules like numpy, scipy, pandas, and scikit-learn. Note that Biopython is not designed to work with the numpy, scipy ecosystem. I found it interesting that the core coders of this project specifically stated that they wanted to create a bioinformatics package that is coded to higher standards than the usual bioinformatics package while they mentioned their release cycle schema. Because of this I think this is a package to keep an eye out for on.



Alex Rubinsteyn talked about PyEnsemble and Varcode. PyEnsembl is a python wrapper for different Ensembl genome annotations. Varcode compares collections of variants between WT and mutant. This currently only works for human genomes but the plan is to generalize to other organisms. There seem to be performance issues with large number of sequences.



Zubin Dowlaty energetic talk about leveraging design thinking for building scalable enterprise intelligent systems. Due to data explosion, there is a need to develop scalable predictive applications to provide an edge. He notes that the data warehouse model is a failed model. Dashboards are boring. One of his points was that all common robust methods should be used for any problem. Lots of words, enthusiasm and big ideas...but I am not sure I understood what he was trying to say :-)

Thomas Caswell gave the state of the library. In short , the project is still active and alive. 1.5 release will happen at the end of the month which contains a new default color map. A 2.0 release is targeted for September. A 2.1 release is planned for in March 2016. Matplotlib started shipping an interactive Ipython backend with V1.4.3. 3D interactive graphs looked a lot more choppy compared to Bokeh though this might be a system difference I am seeing. Seaborn style will come by default with V1.5. A graph now does not need to be redrawn if some of the properties are updated like line thickness and style. Matplotlib 1.5 and higher will support Python 3. This talk is worth checking out for all matplotlib users because there really are a lot of  new features now available.



Stephen Hoover works a cloud-based data science platform that handles everything from data import, to data query, to predictive analytics to automation of the analysis pipeline. The web interface is written with Ruby and JS but the predictive modeling is done in Python 3 not 2.7! The learning lessons are probably useful for people on the Data Science/Scientific Computing group to watch. An interesting remark is that historically R used to have a lot more data analysis packages available but the difference is rapidly disappearing.



Jaime Huerta-Cepas talked about his package called ETE, which is a comprehensive environment for handling and visualizing tree structure. The package contains built-in functions to traverse, annotate, modify, calculating distances, perform tree comparisons, and visualize trees (by generating PNG, PDF or SVG images). Interactive tree images can also be generated. Currently, browser view and Ipython is not in production, but the author has been thinking about it and experimenting with it. Perhaps worth checking out if you have to come up with phylogenetic trees based on large alignment data.


If you want to read summaries of other days, you can read them here:

Day 1 Summary
Day 3 Summary

Comments

Popular posts from this blog

In Other Words: A Life on Our Planet

I just watched this documentary together with my son and my wife. Different from David's typical approach of sparse objective commentary, this documentary movie is a personal witness statement that David Attenborough is making describing how our planet has changed in his life time. It's compelling, and urgent but still hopeful.   Please, watch this documentary and share with your friends so they get the message!

Freely-Speaking: Quick note on bio-based antennaes

With my thesis defense coming up this Monday, I really did not have as much time to share all the interesting things I came across lately. But I did not want to miss the chance to make a quick note to myself and the readers of this site of an interesting paper, titled "DNA-based programming of quantum dot valency, self-assembly and luminescence" just published in Nature Nanotechnology . Grigory Tikhomirov et al. report "the self-assembly of quantum dot complexes using cadmium telluride nanocrystals capped with specific sequences of DNA. Quantum dots with between one and five DNA-based binding sites are synthesized and then used as building blocks to create a variety of rationally designed assemblies, including cross-shaped complexes containing three different types of dots...Through changes in pH, the conformation of the complexes can also be reversibly switched, turning on and off the transfer of energy between the constituent quantum dots." In other w...

Sustainable Living - One Step at a time: Toilet Paper

Introduction It's been a while since last, I posted here. Today, I want to introduce another blogging series which I call "Sustainable Living - One Step at a time" In the past, I have often written and talked about interesting new technologies and ideas in the biological field, some of which could be used to reduce the impact human kind makes on the environment. Although many dedicated brains are tackling these interesting challenges, there are even more who are not working in these kinds of fields. What can other people do to reduce one's impact on the environment? Generally, by adapting a more sustainable approach of living.  Because so many habits and other aspects of life would need to be changed, many people may not feel that it is worth pursuing these efforts because the perceived sacrifices would be too big. Alternatively, one may not know where to start. I do not exclude myself it the latter group. So, instead of trying to do everything at the same tim...