Skip to main content

Recap: SciPy 2015 - Day 1

This year I am attending SciPy which is a conference about Scientific Computing using Python organized by Enthought. I am really happy that I chose SciPy this year as the conference to attend because there appears to be a lot of energy at this conference. In the introduction and welcome words by the committee members, it was noted that conference attendance has been increasing from the 300s to 600 over just the past 3 years.


I talked to some people who attended the tutorials. I am told that the 8 hour scikit-learn tutorial was excellent. Video and Jupyter NB links to follow:

 
Following, are some quick notes on each of the talks that I attended as well as some notes on notable other things.

Chris Wiggins, Chief Data Scientist at the New York Times, gave today's keynote speech. His speech was illustrative because it shows how pervasive the need to convert data into information really is so that even - or perhaps especially - traditional journalism needs data science. His speech showed how unsupervised learning, supervissed learning, and reinforced learning are used at the NYT to explore, learn, test, and optimize the user experience in support of quality journalism.



James Crist from Continuum Analytics talked about Dask, which is a parallel computing framework that makes use of the numpy/scipy ecosystem. The emphasize of the talk was on how Dask could be used to analyze large data sets that could normally not be loaded into memory because they are too big. The use of blocked algorithms leads to significant reductions in memory usage as well as computation time. Dask consists of a graph specification module, a scheduler and a collections module on top of the first two components. This was an excellent talk worth watching

Robert Grant from Enthought talked about DistArray interestingly which seemed very similar in concept to the previous talk.While the previous talk focused on out-of core parallelization, this talk focused on the DistArray data structure which is essentially a proxy object that keeps track of more local version of numpy-like arrays on each of the worker machines. So DistArray is a data structure to enable parallelization of work to different physically separated worker machines. Also worth watching to compare to the previous talk. It seems that Continuum and Enthought are thinking along similar trajectories. By that I mean: they are trying to tackle similar challenges of how to cut down compute time and process large amounts of data.

Jessica Hamrick gave a talk about how Jupyter, JupyterHub and NBgrader can be used in a class room environment to aid in learning. For those interested seeing how technology can be used to enhance classroom learning this is a highly interesting talk.


Yannick Congo gave a superb talk despite not having visual aids (technology fail). His talk focused on tools that were used to make simulation data generation reproducible, and transparent. Skip to 16-17 minutes into the talk to see the demonstration of his work. He uses Docker, and Sumatra (?).


I attended the Visualization track in the afternoon which was full of really, really good talks. All of those talks should be watched.

Bryan Van de Ven gave us the state-of-the-art of Bokeh. The whole idea behind Bokeh is to generate interactive graphs. The summary does not give the talk any justice. Just watch the talk for a visual guide of capabilities. Just a few notes: Development of these tools of progressing at a lightning speed. Last year, Bokeh was at V0.5. This year it's at 0.9.1, and in talking to Bryan after the talk, the goal is to go to 1.0 by the end of the year-ish. Currently, there is a bit of js callbacks that made it back into Bokeh again. The server is being reworked again to refine. The goal is to write wrappers around common discovered patterns. Bokeh bindings for R, JS and Scala exist now!


Luke Campagnola talked about VisPy, which is a module designed for visualization of large data sets using GPU acceleration via OpenGL. VisPy was developed because matplotlib was too slow for fast data acquisition and display. 


Jean-Luc R., in my opinion, had the most intriguing talk introducing HoloViews -  a library of declarative objects that hold data, can be composed and visualizes itself. Implications for scientists are that they don't necessarily have to learn matplotlib to view the data. Scientists just need to compose the data they want to see. This process simplifies Jupyter notebooks.



Matthias Bussonier and Kester Tong hold a talk together to highlight the collaboration between Jupyter and Google's co-laboratory project. I never thought about it like that but real-time collaboration using Jupyter is not a trivial task. The short of the talk: It's work in progress.


At the same time as Matthias' talk, I heard that the Deep Learning talk given by Kyle Kastner was really good and worth watching. Link follows:


I also attended a talk on PyStruct and Pgmpy which were both talks about structured/probabilistic model prediction using python. These talks went over my head so I will have to rewatch them.



If you want to read summaries of other days, go here:

Day 2 Summary
Day 3 Summary

Popular posts from this blog

Sustainable Living: Sunscreens

This is an important topic and so I want to get the most important things out of the way first:

Chemical sunscreens containing the following ingredients contribute to coral bleaching: 
OxybenzoneOctinoxateOctocrylene (used to also stabilize avobenzone)4-methylbenzylidine camphorAnything containing Parabens Don't be part of the problem and avoid using them! It's important to note that claims on sunscreens are not regulated and therefore, companies can put the wording "coral reef safe" on the packaging even though they contain the above chemicals. This is misleading if not outright false. Instead use "physical" sun screens that contain non-nanoparticle zink oxide. Physical sun screens differ from chemical sunscreens in that the sit ontop of the skin to reflect or scatter UVA/B rays away from the skin before it reaches it. Chemical sunscreens absorb the UVA/B rays instead to neutralize them.

To be clear, I am not proposing not using sunscreen! Instead use phys…

Focus on Algae - Part II: Energy

In the last focus section, we discussed how algae can be used to treat waste waters and mitigate CO2 in the process. Today's post will explore how algae can be used for energy generation. As already mentioned in the last time, biofuels have become very visible as of late due to environmental, economical and geopolitcal reasons. If at the heart of traditional biofuel generation lies in the creation and decomposition of biomass, then it would be easy to substitute corn or other less controversial land-based plants with algae. Although a lot of attention is paid to the use of algae in biofuel generation, and this article also mainly focusses on this aspect, it should be noted that algae can also be used to generate electricity by direct combustion of the biomass. Plans for these kinds of schemes are already on the way in Venice and a few other European locations [1].

Algae and Biofuels

What happens to the biomass after it has been created depends on the type of biofuel that is desired…

Sustainable Living: One man's trash...

Since Earth Week is starting tomorrow, I wanted share with you some concrete ways of how individuals like you and me can make an impact on a wider scale. I then also wanted to use this example to challenge everyone to think creatively about the larger context.

So you know how the saying goes: "One man's trash is another one's treasure." Today, I want to talk to you about garbage. Plastic garbage specifically. Plastic is quite a wondrous material. Made from oil by man with just a few additives can turn this polymer into so many different sorts of plastics with so many different properties from thin and flimsy plastic bags, to the carpet on which I am standing, to this plastic bottle from which I am drinking.