Skip to main content

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.


I talked to some people who attended the tutorials. I am told that the 8 hour scikit-learn tutorial was excellent. Video and Jupyter NB links to follow:

 
Following, are some quick notes on each of the talks that I attended as well as some notes on notable other things.

Chris Wiggins, Chief Data Scientist at the New York Times, gave today's keynote speech. His speech was illustrative because it shows how pervasive the need to convert data into information really is so that even - or perhaps especially - traditional journalism needs data science. His speech showed how unsupervised learning, supervissed learning, and reinforced learning are used at the NYT to explore, learn, test, and optimize the user experience in support of quality journalism.



James Crist from Continuum Analytics talked about Dask, which is a parallel computing framework that makes use of the numpy/scipy ecosystem. The emphasize of the talk was on how Dask could be used to analyze large data sets that could normally not be loaded into memory because they are too big. The use of blocked algorithms leads to significant reductions in memory usage as well as computation time. Dask consists of a graph specification module, a scheduler and a collections module on top of the first two components. This was an excellent talk worth watching

Robert Grant from Enthought talked about DistArray interestingly which seemed very similar in concept to the previous talk.While the previous talk focused on out-of core parallelization, this talk focused on the DistArray data structure which is essentially a proxy object that keeps track of more local version of numpy-like arrays on each of the worker machines. So DistArray is a data structure to enable parallelization of work to different physically separated worker machines. Also worth watching to compare to the previous talk. It seems that Continuum and Enthought are thinking along similar trajectories. By that I mean: they are trying to tackle similar challenges of how to cut down compute time and process large amounts of data.

Jessica Hamrick gave a talk about how Jupyter, JupyterHub and NBgrader can be used in a class room environment to aid in learning. For those interested seeing how technology can be used to enhance classroom learning this is a highly interesting talk.


Yannick Congo gave a superb talk despite not having visual aids (technology fail). His talk focused on tools that were used to make simulation data generation reproducible, and transparent. Skip to 16-17 minutes into the talk to see the demonstration of his work. He uses Docker, and Sumatra (?).


I attended the Visualization track in the afternoon which was full of really, really good talks. All of those talks should be watched.

Bryan Van de Ven gave us the state-of-the-art of Bokeh. The whole idea behind Bokeh is to generate interactive graphs. The summary does not give the talk any justice. Just watch the talk for a visual guide of capabilities. Just a few notes: Development of these tools of progressing at a lightning speed. Last year, Bokeh was at V0.5. This year it's at 0.9.1, and in talking to Bryan after the talk, the goal is to go to 1.0 by the end of the year-ish. Currently, there is a bit of js callbacks that made it back into Bokeh again. The server is being reworked again to refine. The goal is to write wrappers around common discovered patterns. Bokeh bindings for R, JS and Scala exist now!


Luke Campagnola talked about VisPy, which is a module designed for visualization of large data sets using GPU acceleration via OpenGL. VisPy was developed because matplotlib was too slow for fast data acquisition and display. 


Jean-Luc R., in my opinion, had the most intriguing talk introducing HoloViews -  a library of declarative objects that hold data, can be composed and visualizes itself. Implications for scientists are that they don't necessarily have to learn matplotlib to view the data. Scientists just need to compose the data they want to see. This process simplifies Jupyter notebooks.



Matthias Bussonier and Kester Tong hold a talk together to highlight the collaboration between Jupyter and Google's co-laboratory project. I never thought about it like that but real-time collaboration using Jupyter is not a trivial task. The short of the talk: It's work in progress.


At the same time as Matthias' talk, I heard that the Deep Learning talk given by Kyle Kastner was really good and worth watching. Link follows:


I also attended a talk on PyStruct and Pgmpy which were both talks about structured/probabilistic model prediction using python. These talks went over my head so I will have to rewatch them.



If you want to read summaries of other days, go here:

Day 2 Summary
Day 3 Summary

Popular posts from this blog

Focus on Algae - Part I: Bioremediation

After spending the last few blog posts on different aspects of dissimilatory bacteria, I want to switch the focus to a different class of organisms I have been interested in for a long time now. These are the algae. Algae comprise a large diversity of "sea weeds" and an even larger variety of single-celled organisms that mostly are capable of doing photosynthesis. They include the ordinary sea-weed, and make up a portion of the green slime found around the edges and the bottom of a pond. More exotic types of algae can live symbiotically - that is together with another organism in a mutually beneficial way. Lichens are an example of symbiotic relationship between algae and fungi. More information about the evolution and lineage of algae can be found in this wiki article.
Image via Wikipedia
Typically, these organisms are either not mentioned at all or only in conjunction with toxic algal blooms. But lately, algae, of course, have been in the news recently because of the promi…

Journal Club:”Direct Exchange of Electrons Within Aggregates of an Evolved Syntrophic Coculture of Anaerobic Bacteria” - OR: How Bacteria Hook up to Share Energy

Another curious observation made the science rounds the past week: wired, electric bacteria. Reading this article reminded me of a review article on dissimilatory bacteria I read before, and one of the most interesting talks I ever attended in my life titled "Eavesdropping on Bacterial Conversations".

What did they do?


Summers, who is Microbiologist working in the Lovley lab at the University of Massachusetts, was studying Fe(III) reducing bacteria in the soil. They wondered what would happen when Fe(III) reducing bacteria would deplete Fe(III) available in the soil. In order to study this question, the research group co-cultured two strains of geobacter bacteria: Geobacter metallireducens and Geobacter sulfurreducens. The research team thought that combining the former bacteria that can oxidize ethanol in order to obtain energy, but normally must pass obtained electrons onto Fe(III) which was not present in the solution, with the latter strain which cannot metabolize, but c…