JuliaCon 2017 Recap
So JuliaCon 2017 has come and gone, and as my first (non-school or work related) conference in a number of years, it was a fun change of pace. The Julia community is still a relatively small one; the total conference attendance was, I believe, no more than 300 people. That being said, having a small conference also led to a far more approachable community, and for that matter, and actual sense of community. I got to meet a lot of very interesting people working on some very interesting projects, some of which may be in my future depending on how things unfold. But more on that later in the post.
First off, for those of you interested in the proceedings of the conference, there are two resources you can use to learn about the different topics discussed:
The Julia Language YouTube channel - The entirety of the videos may take some weeks to be available and to be honest I have my doubts as to the audio quality to expect, but all the material should be there.
The JuliaCon 2017 website - This site has written summaries of all the talks presented (but apparently not keynotes or sponsor talks) along with slides for the presenters that provided them to the conference organizers.
Highlights
Modia
By far, the presentation I was most personally excited about was the Modia project presented by Hilding Elqvist. The talk summary (although, sadly, not the slides) can be found here, and the GitHub repository (which does have more background material, but as of yet no actual source code) can be found here. In a nutshell though, Modia is intended to be the next evolutionary step of the Modelica modeling language, in which the user provides the modeling environment with the general equations governing the physics of the problem and the modeling environment then figures out how to solve that system of equations given the inputs provided. Modelica is already an excellent choice for modeling physical systems, but apparently is hitting performance constraints, so Modia, and its implementation in Julia, is intended to break through these limits.
What I'm most excited about within this project is the opportunity for controls development. The difference from the physical modeling is between modeling the lift, drag, and thrust of an airplane (the plant model) and being able to model how the flaps, ailerons, rudder, etc. should work to make sure the plane flies in the direction you intended (controls model). Right now, the premier tool to do controls development (as much as it may pain some people to hear) is Simulink. Having two different tools isn't necessarily a problem, especially since getting Modelica (or its proprietary version, Dymola) to run in Simulink is more or less a solved problem; it can be done. There are, however, a few main problems I can think of with this approach.
- Performance - For some reason, Simulink interfaces with other programs, even when compiled into S-functions (essentially programs that are imbedded into Simulink) are painfully slow. Some of this is likely a matter of synchronization (the two programs need to exchange information back and forth, which is a slow process), but in general, I'd expect at least and order of magnitude slow down when using something like a Dymola function. Accurate result are great, but waiting forever for them drastically reduces the usefulness of models.
- Transparency - Beyond having to switch between environments to understand the full behavior of the system, if Modelica is compiled into an S-function in order to work in Simulink, then end users without access to the original Modelica model may have no way of figuring out why a certain behavior occurs in the model and whether to believe the simulation or to label it a bug. This can cause dramatic slowdowns in the speed of model development and a lack of trust in the model, both of which can cost such a modeling effort dearly.
- Cost - Yes, it'd be hard not to talk about these two tools without discussing cost. My very rough estimate is that you'd be expecting to pay in excess of $20,000 for commercial licenses for both Simulink and Dymola. This is clearly industrial software, and this makes sense since most of the customers for this software are large corporations. However, there's no reason that a similar tool wouldn't be useful to an engineering startup, where this pricing (which is per user, mind you) could be extremely prohibitive. And it also makes it so that, even if developed open-source, any work produced with these tools is effectively behind a paywall. This is a little less true for Dymola since it shares a lot with Modelica, which is open source, but it's a bit of a crapshoot there on compatibility.
So, if Modia can match or exceed Modelica/Dymola on modeling physical systems, having a companion package for controls on par with Simulink would be a game-changer. Now, this is way easier said than done. MathWorks has been working on Simulink for decades and it's a very capable piece of software, and I suspect the same goes for Dymola. And, while it's not well advertised, MathWorks does have a similar product to Dymola called Simscape for those willing to pay for it. But, in an ideal world where I can snap my fingers and make things happen, having both these tools in a single language/environment, with no licensing fees, in a fully-featured, fast language would open doors.
Consider the following hypothetical example. I want to design a gas engine for a car. Currently I have several programs that model the individual behaviors (fuel consumption, emissions, power output, etc.), along with other programs that model how to control the engine in a vehicle. Even if these models can all be linked together (and my hunch would tell me they couldn't be), each one is a black box to the others; none have any idea how the others work. In order to design this engine, I may need to limit my work to only my particular part of the problem, as trying to bring in everyone would create an exponential explosion of complexity as results have to be shared and managed manually across groups. The best I can do is meet my objectives for my part of the engine, throw my results over the wall to the next group, and hope everything works out. And let's just hope that there's no unforeseen interactions that will lead to costly, last minute alterations in design.
If, however, all the physics of the problem are captured in Modia and other Julia packages using the Modia environment, and the controls are also linked to this combined model, you can do something really powerful; you can start to optimize. Julia already has fantastic packages built up for optimization, and since we have everything in one place, we can start to ask questions like "what are the tradeoffs between weight and power output?" or "how much do different emission targets impact our fuel economy?". The question fundamentally changes from "how do we design this engine?" to "which engine do we want to design?". This is supremely powerful, and without trying to overhype, it's simply one way in which such an integrated environment could change the typical engineering process. Clearly, if all goes as planned, you'll be hearing more about this ;)
The Celeste Project
This was a fantastic presentation on what a bunch of determined Julia coders (Julians?) are capable of. The project itself is available here, but as a brief summary, Celeste reads in data from the Sloan Digital Sky Survey in order to categorize all the celestial objects captured in the survey images. This includes not just distinguishing stars of varying levels of brightness, but distinguishing them from galaxies, and if it is a galaxy, determining the type and shape of the galaxy. Did I mention this database is measured in terabytes and required a supercomputer to process?
But, Celeste pulled it off with a combination of code and compiler optimizations. Some of the numbers shared by the team were really staggering. First, they mentioned that Celeste was the first program not written in C or Fortran to break the teraflop speed limit. I'm not sure what level of "first" this is (on that supercomputer? that they're aware of? definitively for the world?) but it's still a rarefied atmosphere of programming languages capable of producing code of this speed. Second, they mentioned that Celeste successfully loaded 177 TB of data into active memory. Maybe this is more common in the supercomputer world, but thinking to my Matlab experience, I remember that system struggles with files in the range of a few 100MB. Finally, they mentioned that their original, decently optimized code completed their standard benchmark in 320 seconds, or about 5.5 minutes. After some help from some of the Julia core team, they were able to reduce that time by an order of magnitude, down to 17 seconds.
Oh, and did I mention that the plan is to roll the compiler optimizations from this project into the base version of the language? And that everyone will be able to benefit from this work? Obviously, I was both very impressed and excited about the future of Julia's performance.
Web development, plotting, and optimization
This was more a collection of things I found interesting, and that I felt really demonstrated the flexibility of the language. Being able to interact with the internet is a must for any major programming language and Julia has started to get that capability. Web development isn't my strength, but I found it interesting that Julia could be used this way and I'm looking forward to what people are able to do with it.
Plotting is similarly unsexy, but it's a must for any language that wants to be used for scientific and technical work. If you can't use it to show fancy plots in a journal, at a conference, or to your corporate higher-ups, then what's the point (no, seriously, effective communication is a core skill). Previously, a lot of Julia code seemed to use Python libraries to do plotting. These were well known, robust, well tested libraries, but as time goes on Julia is transitioning into graphics packages written in Julia itself. The original Python packages are still an option (Julia uses graphical backends, so you can change which plotting package you're using on the fly) but I see it as something of "growing up" that there are now native Julia options. And from the impression I got, they're now preferred over the Python ones, for whatever that's worth.
The optimization packages (along with differential equation packages) are something I'll have to check out in the near future, but seem to be one of the Crown Jewels of the Julia community. Their continued development, and the sheer size of that effort, is very exciting for a true believer in optimization as a key engineering tool. I suspect, sooner or later, I will become very familiar with those packages.
As far as my whole schedule, for those wondering, I chose this track:
- Tuesday
- An Invitation to Julia: Toward Version 1.0
- The Unique Features and Performance of DifferentialEquations.jl
- Optimization and Solving Systems of Equations in Julia
- NLOptControl.jl a Tool for Solving Nonlinear Optimal Control Problems
- Wednesday
- Keynote: Computation and data in a polyglot world
- Pkg3: Julia's New Package Manager
- Diversity and Inclusion at JuliaCon and in the Scientific Computing Community
- Query.jl: Query Almost Anything in Julia
- Julia: a Major Scripting Language in Economic Research?
- JLD2: High-performance Serialization of Julia Data Structures in an HDF5-compatible Format
- TheoSea: Theory Marching to Light
- Using Parallel Computing for Macroeconomic Forecasting at the Federal Reserve Bank of New York
- The Dolo Modeling Framework
- DataStreams: Roadmap for Data I/O in Julia
- Equations, inequalities and global optimisation: guaranteed solutions using interval methods and constraint propagation
- LightGraphs: Our Network, Our Story
- Junet: Towards Better Network Analysis in Julia
- Julia for Seismic Data Processing and Imaging (Seismic.jl)
- Thursday
- Keynote: Decision Making under Uncertainty
- The State of the Type System
- GR Framework: Present and Future
- GLVisualize 1.0
- QML.jl: Cross-platform GUIs for Julia
- The Celeste Project
- Stochastic Optimization Models on Power Systems
- Automatically Deriving Test Data for Julia Functions
- The Julia VS Code Extension
- Modia: A Domain Specific Extension of Julia for Modeling and Simulation
- Mixed-Mode Automatic Differentiation in Julia
- Cows, Lakes, and a JuMP Extension for Multi-stage Stochastic Optimization
- Applications of Convex.jl in Optimization Involving Complex Numbers
- Solving Geophysical Inverse Problems with the jInv.jl Framework: Seeing Underground with Julia
- Friday
- A Superfacility Model for Data-Intensive Science
- Taking Vector Transposes Seriously
- Full Stack Web Development with Genie.jl
- Web Scraping with Julia
- WebIO.jl: a Thin Abstraction Layer for Web Based Widgets
- Nulls.jl: Missingness for Data in Julia
- Julia Roadmap
- Julia: The Type of Language for Mathematical Programming
- TaylorIntegration.jl: Taylor's Integration Method in Julia
- L1-penalized Matrix Linear Models for High Throughput Data
- Julia on the Raspberry Pi
- GraphGLRM: Making Sense of Big Messy Data
- The Present and Future of Robotics in Julia
- MultipleTesting.jl: Simultaneous Statistical Inference in Julia
- JuliaDB
During the conference, I ran across a manager from MathWorks who had come (presumably) to learn about Julia and why people were gravitating towards it. I spoke with him for a while as someone who's used Matlab for years and has become interested in Julia. During that conversation, I made the admittedly very broad claim that Julia just felt like a "real" programming language. He asked me what I meant by that (a fair question) and I promised to think about it and give him a thought out answer later. At the end of the conference, while writing that email and reflecting on the week, I realized what I meant. The truly high powered packages in Matlab are (so far as I know) mostly or all written in other languages, like C, Fortran, or Java. Matlab simply doesn't have the efficiency or expressive power to create all the features for the language. Julia, in contrast, has virtually all of its core packages written in Julia. With Julia I have a "complete" toolset, with Matlab, there will always be compromises. Again, I'll say that Matlab still has its place, but I wanted to share this observation, which I also shared with MathWorks.
Other than that, it was a lot of getting to know people (AKA, networking), occasionally getting frustrated at presentation styles (or lack thereof), and checking out Berkeley (and eating way too much poutine... I guess I am still a recovering Midwesterner). I'm really glad I went, learned a bunch, met some good people, and maybe even found a couple projects worth putting professional effort into. Until next year I suppose :)