Recap: SciPy 2015

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.

I talked to some people who attended the tutorials. I am told that the 8 hour scikit-learn tutorial was excellent. Video and Jupyter NB links to follow:

https://github.com/amueller/scipy_2015_sklearn_tutorial

Following, are some quick notes on each of the talks that I attended as well as some notes on notable other things.

Chris Wiggins, Chief Data Scientist at the New York Times, gave today's keynote speech. His speech was illustrative because it shows how pervasive the need to convert data into information really is so that even - or perhaps especially - traditional journalism needs data science. His speech showed how unsupervised learning, supervissed learning, and reinforced learning are used at the NYT to explore, learn, test, and optimize the user experience in support of quality journalism.

James Crist from Continuum Analytics talked about Dask, which is a parallel computing framework that makes use of the numpy/scipy ecosystem. The emphasize of the talk was on how Dask could be used to analyze large data sets that could normally not be loaded into memory because they are too big. The use of blocked algorithms leads to significant reductions in memory usage as well as computation time. Dask consists of a graph specification module, a scheduler and a collections module on top of the first two components. This was an excellent talk worth watching

Robert Grant from Enthought talked about DistArray interestingly which seemed very similar in concept to the previous talk.While the previous talk focused on out-of core parallelization, this talk focused on the DistArray data structure which is essentially a proxy object that keeps track of more local version of numpy-like arrays on each of the worker machines. So DistArray is a data structure to enable parallelization of work to different physically separated worker machines. Also worth watching to compare to the previous talk. It seems that Continuum and Enthought are thinking along similar trajectories. By that I mean: they are trying to tackle similar challenges of how to cut down compute time and process large amounts of data.

Jessica Hamrick gave a talk about how Jupyter, JupyterHub and NBgrader can be used in a class room environment to aid in learning. For those interested seeing how technology can be used to enhance classroom learning this is a highly interesting talk.

Yannick Congo gave a superb talk despite not having visual aids (technology fail). His talk focused on tools that were used to make simulation data generation reproducible, and transparent. Skip to 16-17 minutes into the talk to see the demonstration of his work. He uses Docker, and Sumatra (?).

I attended the Visualization track in the afternoon which was full of really, really good talks. All of those talks should be watched.

Bryan Van de Ven gave us the state-of-the-art of Bokeh. The whole idea behind Bokeh is to generate interactive graphs. The summary does not give the talk any justice. Just watch the talk for a visual guide of capabilities. Just a few notes: Development of these tools of progressing at a lightning speed. Last year, Bokeh was at V0.5. This year it's at 0.9.1, and in talking to Bryan after the talk, the goal is to go to 1.0 by the end of the year-ish. Currently, there is a bit of js callbacks that made it back into Bokeh again. The server is being reworked again to refine. The goal is to write wrappers around common discovered patterns. Bokeh bindings for R, JS and Scala exist now!

Luke Campagnola talked about VisPy, which is a module designed for visualization of large data sets using GPU acceleration via OpenGL. VisPy was developed because matplotlib was too slow for fast data acquisition and display.

Jean-Luc R., in my opinion, had the most intriguing talk introducing HoloViews - a library of declarative objects that hold data, can be composed and visualizes itself. Implications for scientists are that they don't necessarily have to learn matplotlib to view the data. Scientists just need to compose the data they want to see. This process simplifies Jupyter notebooks.

Matthias Bussonier and Kester Tong hold a talk together to highlight the collaboration between Jupyter and Google's co-laboratory project. I never thought about it like that but real-time collaboration using Jupyter is not a trivial task. The short of the talk: It's work in progress.

At the same time as Matthias' talk, I heard that the Deep Learning talk given by Kyle Kastner was really good and worth watching. Link follows:

I also attended a talk on PyStruct and Pgmpy which were both talks about structured/probabilistic model prediction using python. These talks went over my head so I will have to rewatch them.

If you want to read summaries of other days, go here:

Day 2 Summary
Day 3 Summary

Bio-Based Ideas

Search This Blog

Recap: SciPy 2015 - Day 1

Labels

Comments

Post a Comment

Popular posts from this blog

Focus on Algae - Part I: Bioremediation

Permaculture: nature is still smarter than us

Freely-Speaking: Quick note on bio-based antennaes