Skip to main content

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.


I talked to some people who attended the tutorials. I am told that the 8 hour scikit-learn tutorial was excellent. Video and Jupyter NB links to follow:

 
Following, are some quick notes on each of the talks that I attended as well as some notes on notable other things.

Chris Wiggins, Chief Data Scientist at the New York Times, gave today's keynote speech. His speech was illustrative because it shows how pervasive the need to convert data into information really is so that even - or perhaps especially - traditional journalism needs data science. His speech showed how unsupervised learning, supervissed learning, and reinforced learning are used at the NYT to explore, learn, test, and optimize the user experience in support of quality journalism.



James Crist from Continuum Analytics talked about Dask, which is a parallel computing framework that makes use of the numpy/scipy ecosystem. The emphasize of the talk was on how Dask could be used to analyze large data sets that could normally not be loaded into memory because they are too big. The use of blocked algorithms leads to significant reductions in memory usage as well as computation time. Dask consists of a graph specification module, a scheduler and a collections module on top of the first two components. This was an excellent talk worth watching

Robert Grant from Enthought talked about DistArray interestingly which seemed very similar in concept to the previous talk.While the previous talk focused on out-of core parallelization, this talk focused on the DistArray data structure which is essentially a proxy object that keeps track of more local version of numpy-like arrays on each of the worker machines. So DistArray is a data structure to enable parallelization of work to different physically separated worker machines. Also worth watching to compare to the previous talk. It seems that Continuum and Enthought are thinking along similar trajectories. By that I mean: they are trying to tackle similar challenges of how to cut down compute time and process large amounts of data.

Jessica Hamrick gave a talk about how Jupyter, JupyterHub and NBgrader can be used in a class room environment to aid in learning. For those interested seeing how technology can be used to enhance classroom learning this is a highly interesting talk.


Yannick Congo gave a superb talk despite not having visual aids (technology fail). His talk focused on tools that were used to make simulation data generation reproducible, and transparent. Skip to 16-17 minutes into the talk to see the demonstration of his work. He uses Docker, and Sumatra (?).


I attended the Visualization track in the afternoon which was full of really, really good talks. All of those talks should be watched.

Bryan Van de Ven gave us the state-of-the-art of Bokeh. The whole idea behind Bokeh is to generate interactive graphs. The summary does not give the talk any justice. Just watch the talk for a visual guide of capabilities. Just a few notes: Development of these tools of progressing at a lightning speed. Last year, Bokeh was at V0.5. This year it's at 0.9.1, and in talking to Bryan after the talk, the goal is to go to 1.0 by the end of the year-ish. Currently, there is a bit of js callbacks that made it back into Bokeh again. The server is being reworked again to refine. The goal is to write wrappers around common discovered patterns. Bokeh bindings for R, JS and Scala exist now!


Luke Campagnola talked about VisPy, which is a module designed for visualization of large data sets using GPU acceleration via OpenGL. VisPy was developed because matplotlib was too slow for fast data acquisition and display. 


Jean-Luc R., in my opinion, had the most intriguing talk introducing HoloViews -  a library of declarative objects that hold data, can be composed and visualizes itself. Implications for scientists are that they don't necessarily have to learn matplotlib to view the data. Scientists just need to compose the data they want to see. This process simplifies Jupyter notebooks.



Matthias Bussonier and Kester Tong hold a talk together to highlight the collaboration between Jupyter and Google's co-laboratory project. I never thought about it like that but real-time collaboration using Jupyter is not a trivial task. The short of the talk: It's work in progress.


At the same time as Matthias' talk, I heard that the Deep Learning talk given by Kyle Kastner was really good and worth watching. Link follows:


I also attended a talk on PyStruct and Pgmpy which were both talks about structured/probabilistic model prediction using python. These talks went over my head so I will have to rewatch them.



If you want to read summaries of other days, go here:

Day 2 Summary
Day 3 Summary

Comments

Popular posts from this blog

In Other Words: A Life on Our Planet

I just watched this documentary together with my son and my wife. Different from David's typical approach of sparse objective commentary, this documentary movie is a personal witness statement that David Attenborough is making describing how our planet has changed in his life time. It's compelling, and urgent but still hopeful.   Please, watch this documentary and share with your friends so they get the message!

Sustainable Living - One Step at a time: Toilet Paper

Introduction It's been a while since last, I posted here. Today, I want to introduce another blogging series which I call "Sustainable Living - One Step at a time" In the past, I have often written and talked about interesting new technologies and ideas in the biological field, some of which could be used to reduce the impact human kind makes on the environment. Although many dedicated brains are tackling these interesting challenges, there are even more who are not working in these kinds of fields. What can other people do to reduce one's impact on the environment? Generally, by adapting a more sustainable approach of living.  Because so many habits and other aspects of life would need to be changed, many people may not feel that it is worth pursuing these efforts because the perceived sacrifices would be too big. Alternatively, one may not know where to start. I do not exclude myself it the latter group. So, instead of trying to do everything at the same tim...

Freely-Speaking: Quick note on bio-based antennaes

With my thesis defense coming up this Monday, I really did not have as much time to share all the interesting things I came across lately. But I did not want to miss the chance to make a quick note to myself and the readers of this site of an interesting paper, titled "DNA-based programming of quantum dot valency, self-assembly and luminescence" just published in Nature Nanotechnology . Grigory Tikhomirov et al. report "the self-assembly of quantum dot complexes using cadmium telluride nanocrystals capped with specific sequences of DNA. Quantum dots with between one and five DNA-based binding sites are synthesized and then used as building blocks to create a variety of rationally designed assemblies, including cross-shaped complexes containing three different types of dots...Through changes in pH, the conformation of the complexes can also be reversibly switched, turning on and off the transfer of energy between the constituent quantum dots." In other w...