SciPy 2015 has come and gone. If I step back, what are some of the learning lessons?
There were certain themes that recurred from talk to talk:
It can be said that the organizers outdid themselves in putting together a conference so well. As mentioned earlier within just the last 3 years, the conference has grown from 300 to more than 600 conference attendees. SciPy is no longer small. But despite the fact, the conference has successfully retained an atmosphere of a small family meeting. The growth in conference attendance at the least shows the increased interest in using Python as a language to perform all kinds of data science computation.
Beyond this though, the combination of high quality talks, and balance with social interactions has made this my default conference to go to over the next few years! See you all in 2016!
There were certain themes that recurred from talk to talk:
- Speed. One of the perceived limitations of Python seems to be speed of execution which is important to process very large datasets. Many talks dealt with this topic in various ways. Some of the approaches included enabling process parallelization (Dask, DistArray), GPU-acceleration (VisPy), or acceleration via some means of compilation - sometimes just-in-times (Numba). With these tools, Python is no longer slow. It's impressive that the combination of these approaches has enabled data scientists to process 60+ GB data sets as if they would be loaded into memory on one small laptop that actually only has 8-16 GB of memory.
- Visualization was a theme. So many talks dealt with making complex data sets visible, and they did so to address different issues: Serialization to enable interactivity (Bokeh, matplotlib), visualizing large dynamic datasets with low latency (VisPy), making it easier for normal scientists to view and share visual graphs (Holoviews).
- Reproducibility and transparency of data analysis in science which includes the concept of literate programming was a theme. Again, different solutions were presented. The solutions ranged from ensuring that data analysis was performed on the same configurations on different levels (Docker, Dexter), to striking the right balance between enough metadata while also having beautiful formatting, to reducing error rates by automating execution of different simulation scenarios.
- The scientific Python ecosystem is developing at the speed of light. Traditionally, R has been said to be superior to Python when it comes to availability of specialty packages that could do one specific thing. Well, I heard again and again that this difference is rapidly disappearing. And because Python is a more universal programing language, there are things Python code can do that R code cannot. Many quality packages are developed at rapid speed. Just take Bokeh for instance. Last year: 0.5. This year, the module will reach production quality with the intended release of V1.0 later this year. And this is just one example.
- Having pointed out the previous point, I will say as well that the language wars are over. Another fascinating theme was going post-language wars. Instead of trying to convince X, that language Y is the best and all else sucks, there is a desire to go beyond that. Bokeh, for example, now has bindings for R and Julia. The evolved Ipython Notebook, now called Jupyter, which actually stands for Julia, Python and R, is now language agnostic. The individual language capabilities are supposed to be added via loading specific kernels. Currently, the most active kernel developments are in the R and Julia space. There is also discussion about a universal data storage structure that could be shared at least between the 3 previously mentioned languages. These are thoughts that have started to be seriously discussed in the scientific Python community. Scientific Python developers have started reaching out to the R and Julia community but these discussions are very nascent.
- Python and teaching was a theme. Ipython notebook and the ecosystem developing around that make great ways to teach students science, programing, etc.
- Develop in Python 3! Last but not least, in the past, Python 3 migration and development has always hinged on the availability of Python 3 compatible modules. Even last year this was still true. This year, there was universal consensus that Python 3 is no longer the future. The time to migrate is now!
It can be said that the organizers outdid themselves in putting together a conference so well. As mentioned earlier within just the last 3 years, the conference has grown from 300 to more than 600 conference attendees. SciPy is no longer small. But despite the fact, the conference has successfully retained an atmosphere of a small family meeting. The growth in conference attendance at the least shows the increased interest in using Python as a language to perform all kinds of data science computation.
Beyond this though, the combination of high quality talks, and balance with social interactions has made this my default conference to go to over the next few years! See you all in 2016!
Comments
Post a Comment