Data Visualization Library in Swift

KarthikRIyer · May 30, 2019, 7:56pm

Hello everyone!

I am a Google Summer of Code candidate at TensorFlow this year, and my project is to develop a data visualization library in Swift. The aim of the project is to make a plotting framework in Swift that works cross-platform.

I’ve begun working on it and have implemented Line Chart as of now, and I will be implementing more plots as per my proposal.
Here is a link to the GitHub repo: https://github.com/KarthikRIyer/swiftplot
Here is a link to my proposal: https://github.com/KarthikRIyer/GSoC-proposal

I am very excited to be working on this project and am eager to hear your thoughts about it!

phren0logy · May 30, 2019, 9:19pm

This is not my area of expertise, but in the time I was working with R, I found ggplot2 to be a great library. The explanation behind its design is described here: http://vita.had.co.nz/papers/layered-grammar.html

I’ll put it out there as a potentially instructive way of thinking about plots and graphics at a lower level than in your proposal, which might allow it to be more flexible for future applications.

Best of luck, and thanks in advance for your work!

KarthikRIyer · June 1, 2019, 6:09am

Thanks a lot for the paper @phren0logy!

Anders · June 3, 2019, 8:55pm

Very cool! I really like what you’ve done so far.

Here are some thoughts on ways you could make it even easier to use.

According to the README, here’s how you create create a line graph:
var lineGraph: LineGraph = LineGraph()
lineGraph.addSeries(x, y, label: “Plot 1”, color: .lightBlue)
plotTitle.title = “SINGLE SERIES”
lineGraph.plotTitle = plotTitle

Then, to save it in a file:
lineGraph.drawGraphAndOutput(fileName: “lineChartMultipleSeries”, renderer: agg_renderer)

Or to display it in a Jupyter notebook:
lineGraph.drawGraph(renderer: agg_renderer)
display(base64EncodedPNG: agg_renderer.base64Png())

Given that a lot of the time folks will be producing a bunch of plots, it might be worth making these changes if they aren’t too much of a pain to do:

First, instead of or in addition to creating the title and then adding it to the plot, could you add a method so you can do it in one shot— eg, lineGraph.addPlotTitle(“SINGLE SERIES”)

Second, most of the time, users are going to want to use just one renderer etc in a script, so why have to specify it for every plot you have in a script? And given how wonderfully clear and straightforward everything else is for creating a line plot, for beginners, having to add "renderer: agg_renderer” or “base64EncodedPNG: agg_renderer.base64Png())” is going to look a little weird and a little intimidating.

Can you set it up so users can define it just once at the beginning of the script?
Can you add a method so instead of needing 2 lines to display the graph in a notebook, you can do it w one— eg, lineGraph.draw() or lineGraph.display()?

All small stuff, but if you succeed and this library turns into a tool that a ton of people are using, little tweaks like this could make using it even easier to use and teach than it already is.

So glad you’re working on this! Looking forward to seeing what you produce.

kovasb · June 3, 2019, 9:05pm

Great to see someone working on this.

As it happens, I was taking a look into the same topic a few weeks ago.

One suggestion: Have you looked into Cairo? It has actively maintained Swift bindings: https://github.com/PureSwift/Cairo

IMO, one feature of matplotlib etc that is worth avoiding is multiple backends. Cairo seems the most widely supported & easy to interface with of the various options.

Good luck!

kovasb · June 3, 2019, 9:11pm

One more unsolicited suggestion.

A bunch of complexity in doing a from-scratch viz library is in dealing with labels, legends, & more generally, alignment of multiple subelements.

There are a number of implementations of the cassowary layout algorithm in Swift (ex: https://github.com/inamiy/Cassowary) that could make life quite a bit simpler.

bradlarson · June 3, 2019, 9:52pm

With the right level of abstraction, I don’t think that multiple backends will add a tremendous amount of complexity, and should allow for the most flexibility in application. The backend should only handle the simplest primitives (line/box/etc. drawing, text rendering), with layout and all other logic occurring in the higher-level Swift code.

By supporting multiple backends, this should give the greatest platform flexibility. For example, Mac and iOS have Core Graphics as a native vector and rasterization framework, so you wouldn’t need AGG support there and could even avoid building it in to minimize application size. However, Core Graphics isn’t present elsewhere, so AGG might be needed for Linux and Windows. If you wanted to use this in an application that needed high-performance constantly updating graphs (sensor readouts, etc.), you might opt for an OpenGL / Vulkan / Metal rendering backend.

With the right design, I believe this could be able to provide drop-in graphs for iOS applications, matplotlib-like plots in Jupyter notebooks, plots for server-side Swift web applications, or live training graphs in a pop-up window for a Linux-based Swift for TensorFlow application.

For more on this, the discussions that led to this design can largely be found in this thread from the Swift for TensorFlow mailing list.

kovasb · June 3, 2019, 10:31pm

I appreciate that others might have a different POV on this. I’m mostly concerned with the UX of getting high-quality data visualizations for DS/ML type work.

Abstracting the renderer isn’t a big deal. These are well-trodden concepts at this point. The complexity cost that drives my comments is more about build/install/bundle/config complexity.

The typical problem I have in Python: I launch Jupyter in some new environment that I don’t control, and the plots look terrible bc the backend has been set up wonky. Debugging this as a user is a real drain bc of all the interactions with graphics systems, native code, versions of everything, configs in various places, etc. And the documentation & code paths are not something you want to be wading through.

Related to the above: Getting pixel-reproducible plots is important for some people, and having plots look different across platforms may be a bug for some use cases.

A big question here is, what is the use case / target audience? If it is data visualization for data scientist/modeling type work, nothing is lost by having a single well-supported backend. If its native-enabled cross-platform charting, then multiple backends (and all the associated complexity) is a price you have to pay.

KarthikRIyer · June 4, 2019, 6:59pm

Having cross-platform charting is one of the primary aims of this project.

KarthikRIyer · June 4, 2019, 7:17pm

Thanks a lot for the appreciation @Anders!

I think it does make sense to add an addPlotTitle() function.

Regarding defining a renderer for each plot and writing lesser lines for plotting:
At the moment I’m trying to get a few basic plots working. I will be doing this in a future PR.
Discussion regarding this can be found here.