Skip to main content

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.

I talked to some people who attended the tutorials. I am told that the 8 hour scikit-learn tutorial was excellent. Video and Jupyter NB links to follow:

Following, are some quick notes on each of the talks that I attended as well as some notes on notable other things.

Chris Wiggins, Chief Data Scientist at the New York Times, gave today's keynote speech. His speech was illustrative because it shows how pervasive the need to convert data into information really is so that even - or perhaps especially - traditional journalism needs data science. His speech showed how unsupervised learning, supervissed learning, and reinforced learning are used at the NYT to explore, learn, test, and optimize the user experience in support of quality journalism.

James Crist from Continuum Analytics talked about Dask, which is a parallel computing framework that makes use of the numpy/scipy ecosystem. The emphasize of the talk was on how Dask could be used to analyze large data sets that could normally not be loaded into memory because they are too big. The use of blocked algorithms leads to significant reductions in memory usage as well as computation time. Dask consists of a graph specification module, a scheduler and a collections module on top of the first two components. This was an excellent talk worth watching

Robert Grant from Enthought talked about DistArray interestingly which seemed very similar in concept to the previous talk.While the previous talk focused on out-of core parallelization, this talk focused on the DistArray data structure which is essentially a proxy object that keeps track of more local version of numpy-like arrays on each of the worker machines. So DistArray is a data structure to enable parallelization of work to different physically separated worker machines. Also worth watching to compare to the previous talk. It seems that Continuum and Enthought are thinking along similar trajectories. By that I mean: they are trying to tackle similar challenges of how to cut down compute time and process large amounts of data.

Jessica Hamrick gave a talk about how Jupyter, JupyterHub and NBgrader can be used in a class room environment to aid in learning. For those interested seeing how technology can be used to enhance classroom learning this is a highly interesting talk.

Yannick Congo gave a superb talk despite not having visual aids (technology fail). His talk focused on tools that were used to make simulation data generation reproducible, and transparent. Skip to 16-17 minutes into the talk to see the demonstration of his work. He uses Docker, and Sumatra (?).

I attended the Visualization track in the afternoon which was full of really, really good talks. All of those talks should be watched.

Bryan Van de Ven gave us the state-of-the-art of Bokeh. The whole idea behind Bokeh is to generate interactive graphs. The summary does not give the talk any justice. Just watch the talk for a visual guide of capabilities. Just a few notes: Development of these tools of progressing at a lightning speed. Last year, Bokeh was at V0.5. This year it's at 0.9.1, and in talking to Bryan after the talk, the goal is to go to 1.0 by the end of the year-ish. Currently, there is a bit of js callbacks that made it back into Bokeh again. The server is being reworked again to refine. The goal is to write wrappers around common discovered patterns. Bokeh bindings for R, JS and Scala exist now!

Luke Campagnola talked about VisPy, which is a module designed for visualization of large data sets using GPU acceleration via OpenGL. VisPy was developed because matplotlib was too slow for fast data acquisition and display. 

Jean-Luc R., in my opinion, had the most intriguing talk introducing HoloViews -  a library of declarative objects that hold data, can be composed and visualizes itself. Implications for scientists are that they don't necessarily have to learn matplotlib to view the data. Scientists just need to compose the data they want to see. This process simplifies Jupyter notebooks.

Matthias Bussonier and Kester Tong hold a talk together to highlight the collaboration between Jupyter and Google's co-laboratory project. I never thought about it like that but real-time collaboration using Jupyter is not a trivial task. The short of the talk: It's work in progress.

At the same time as Matthias' talk, I heard that the Deep Learning talk given by Kyle Kastner was really good and worth watching. Link follows:

I also attended a talk on PyStruct and Pgmpy which were both talks about structured/probabilistic model prediction using python. These talks went over my head so I will have to rewatch them.

If you want to read summaries of other days, go here:

Day 2 Summary
Day 3 Summary

Popular posts from this blog

Sustainable Living: One man's trash...

Since Earth Week is starting tomorrow, I wanted share with you some concrete ways of how individuals like you and me can make an impact on a wider scale. I then also wanted to use this example to challenge everyone to think creatively about the larger context.

So you know how the saying goes: "One man's trash is another one's treasure." Today, I want to talk to you about garbage. Plastic garbage specifically. Plastic is quite a wondrous material. Made from oil by man with just a few additives can turn this polymer into so many different sorts of plastics with so many different properties from thin and flimsy plastic bags, to the carpet on which I am standing, to this plastic bottle from which I am drinking.

Focus on Algae - Part I: Bioremediation

After spending the last few blog posts on different aspects of dissimilatory bacteria, I want to switch the focus to a different class of organisms I have been interested in for a long time now. These are the algae. Algae comprise a large diversity of "sea weeds" and an even larger variety of single-celled organisms that mostly are capable of doing photosynthesis. They include the ordinary sea-weed, and make up a portion of the green slime found around the edges and the bottom of a pond. More exotic types of algae can live symbiotically - that is together with another organism in a mutually beneficial way. Lichens are an example of symbiotic relationship between algae and fungi. More information about the evolution and lineage of algae can be found in this wiki article.
Image via Wikipedia
Typically, these organisms are either not mentioned at all or only in conjunction with toxic algal blooms. But lately, algae, of course, have been in the news recently because of the promi…

Freely-Speaking: On the need to act with urgency.

I just read this article on the Great Barrier Reef suffering irreversible damage from climate disruption. It moved me so much that I just had to quickly post an appeal to anyone who happened to be reading this blog:

The changes happening to our environment are real, massive, and definitely caused in very large parts by human action (e.g. burning of fossil fuels for transportation, and energy, deforestation etc.) and made worse by inaction (e.g.: governments twiddling their thumbs and ignoring the problem, or afraid of shaking up the status quo).

There is some good news to all of this too though: Since it is humans causing this problem, it is also up to us to do everything in our power to fix these problems. And since Earth Week is also coming up, I would like to appeal to everyone to move to action.