On D3
This post is going to be a collection of my observations about d3 - both its API and its source code as I explored using it for some personal work.
My requirements when coming to d3 was to draw charts. In fact, before actually getting my feet wet with it, all my interactions with other developers led me to categorize d3 as a charting API.
After actually exploring it however, I realize now that I was wrong. d3 is not a charting API so much as it is a generalized data visualization API. Specifically, d3 is an API to transform data into SVG data (and manipulate SVG DOM). I think we ought to stop refering to D3 as a “charting API” in common parlance and say something more accurate, like “D3 is to SVG (and to some extent, HTML too) what jQuery is to HTML”.
To put a finer point on it, when I started exploring d3, I expected to find a data driven API, basically a function that would accept some data, presumably in JSON and output a graph based on it. However, what I found instead is functional Javascript API that constructs SVG data using a series of transformations of input data.
The API
I decided to start with the API itself since that is what a d3 user, usually a developer encounters straight off the bat.
I am going to evaluate the API (and provide my observations thereof) based on three pre-determined criteria:
- how declarative is the API?
- how often do “patterns” in use of the API repeat across multiple visualizations?
How declarative is the API?
Consider the introductory tutorial code for creating a bar chart using d3:
var data = [30, 86, 168, 281, 303, 365];
d3.select(".chart")
.selectAll("div")
.data(data)
.enter()
.append("div")
.style("width", function(d) { return d + "px"; })
.text(function(d) { return d; });
An observation that can be made straight off the bat is that d3 is declarative - i.e. the use of the API involves “specifying” a series of transformations (chained functions) rather than issuing a series of function calls (in an imperative style).
Another point to note is that d3 does not just concern itself with generating SVG. The above example is generating a series of rectangular ‘div’ elements. So d3 does HTML DOM manipulation as well as SVG DOM manipulation.
How often do “patterns” in use of the API repeat across multiple visualizations?
This question bears answering since the answer determines the ease of learning this API to the point where you can weild it to generate more customized visualizations.
The answer would be that primarily, learning of the d3 API involves learning of the ‘core’ d3 module, comprising most importantly of selections, transitions and data manipulation functions. The rest of the API comprises of a series of utilities: to interpolate data, perform geospatial calculations and transformations, with some often needed geometry primitives thrown in.
While I am no d3 expert, getting the d3 selection API down would pretty much let you weild d3 with ease, since these selection patterns are what you find repeating across visualizations.
The Code
Code Organization
Instead of a monolith-library, d3 is an umbrella of distinct libraries with clear separation of concerns. At the top level is a repo named d3/d3
which is primarily just a meta-repo pulling in all the other d3 dependency repos.
Each of the other libraries are in repos of their own. Some of them are:
- d3/d3-core
- d3/d3-shape
- d3/d3-scale
- d3/d3-hierarchy
- …
Code Exploration
As an exercise to determine whether d3 code is easily hackable, I decided to take a look at two functions that seem most intriguing in the d3 API: data()
and enter()
. To me these really seem the essence of the d3 API, because it is in these functions, where the magic of the correlation between the input data and the generated SVG/HTML DOM nodes seems to occur.
The other functions, as I see them, are a bunch of selectors and setters of HTML/SVG elements/attributes and the rest are utilities (like data interpolation and date formatting) which I can find other utility libraries to do anyway.
As a complete beginner to d3, it took me less than 30 seconds to track down the source code to data()
. That speaks volumes about how good the code organization is in this project.
It appears that most code in d3 is so modularized that most files never exceed 200 lines or so. That is again an impressive feat considering d3’s size.
The following was the code for the default exported function from the data.js
file.
export default function(value, key) {
if (!value) {
data = new Array(this.size()), j = -1;
this.each(function(d) { data[++j] = d; });
return data;
}
var bind = key ? bindKey : bindIndex,
parents = this._parents,
groups = this._groups;
if (typeof value !== "function") value = constant(value);
for (var m = groups.length, update = new Array(m), enter = new Array(m), exit = new Array(m), j = 0; j < m; ++j) {
var parent = parents[j],
group = groups[j],
groupLength = group.length,
data = value.call(parent, parent && parent.__data__, j, parents),
dataLength = data.length,
enterGroup = enter[j] = new Array(dataLength),
updateGroup = update[j] = new Array(dataLength),
exitGroup = exit[j] = new Array(groupLength);
bind(parent, group, enterGroup, updateGroup, exitGroup, data, key);
// Now connect the enter nodes to their following update node, such that
// appendChild can insert the materialized enter node before this node,
// rather than at the end of the parent node.
for (var i0 = 0, i1 = 0, previous, next; i0 < dataLength; ++i0) {
if (previous = enterGroup[i0]) {
if (i0 >= i1) i1 = i0 + 1;
while (!(next = updateGroup[i1]) && ++i1 < dataLength);
previous._next = next || null;
}
}
}
update = new Selection(update, parents);
update._enter = enter;
update._exit = exit;
return update;
}
Now, am not sure if I have the full picture, but I would NOT approve a merge request featuring the above code. There were several instances where I went “Ugh” as I was reading this code:
- variables accessed before they were declared (consider
data
). I understand that hoisting works and that this wouldn’t be problematic, but still, ugh. - multiple assignment expressions in the same line:
data = new Array(this.size()), j = -1;
-
a completely unnecessary use of Array object constructor
data = new Array(this.size())
. As far as I know, the net result of:data = new Array(this.size()), j = -1; this.each(function(d) { data[++j] = d; });
is no different from:
data = [], j = -1; this.each(function(d) { data[++j] = d; });
simply because, unlike in other languages, which use dynamically resizing contiguous memory blocks for their vectors, JS arrays are just glorified hashtables, with a variable tracking the length of it.
- Highly “stateful” logic in that there is extensive mutations of variables. Then again, this might done with performance in mind.
Once you get past all the initial revulsion at the arcane coding style, what would strike one as puzzling is the use of this
in a seemingly non-member function. However, once you recall that the data
function is actually called on a collection of DOM elements that previous selectors returned, you realize that this function must be getting invoked on the collection. So, I started looking for where this gets bound to an object or an object prototype. A few greps later, I discovered in selection/index.js
:
Selection.prototype = selection.prototype = {
constructor: Selection,
select: selection_select,
selectAll: selection_selectAll,
filter: selection_filter,
data: selection_data,
enter: selection_enter,
exit: selection_exit,
merge: selection_merge,
order: selection_order,
sort: selection_sort,
call: selection_call,
nodes: selection_nodes,
node: selection_node,
size: selection_size,
empty: selection_empty,
each: selection_each,
attr: selection_attr,
style: selection_style,
property: selection_property,
classed: selection_classed,
text: selection_text,
html: selection_html,
raise: selection_raise,
lower: selection_lower,
append: selection_append,
insert: selection_insert,
remove: selection_remove,
datum: selection_datum,
on: selection_on,
dispatch: selection_dispatch
};
Now it all made sense. The following was what I could glean from reading the rest of the code without any further information:
- The
data
function is called with an array or object as its first argument. - The values of this input data are “bound” to the selected nodes, i.e. a one-to-one mapping between the data and the selected nodes is established using the
bindKey
orbindIndex
functions. bindKey
comes into play where the input data is itself an object,bindIndex
comes into play where the input data is an array.- What both the bind functions do is establish a mapping between the data and the input node selection. Each data item is stored in a member named
__data__
in the corresponding node. - What is returned is a transformed selection of nodes, each of which is “aware” of the data it needs to render as part of itself.
The Not So Clear Bits
- There is repeated mentions of “entry nodes” and “exit nodes” which is not clear to me as of now considering this is my first reading of this code. Hopefully, poking around the rest of the files and API docs should make things clearer, failing which I can probably drop into the d3 developer IRC channels for help.
Overall Impressions
What I Like About d3
- I love the API. If I were creating a set of utilities to do manipulation of the DOM (be it HTML or SVG), this is exactly how I would structure it.
- The code modularization is really superb. A beginner can jump in and start deciphering the key bits almost immediately.
- I love the essence of the idea behind D3 API: take input numerical data, apply a series of transformations on it, till you have a visualization. There is a lot of functional programming zen in this approach.
- I admire the fact that the code is readable enough for a noob to jump in and start examining the key internals of the code within just a few minutes.
What I Don’t Like About d3
- The d3 umbrella of modules tries to be everything that its user needs. While each of these modules seem well written and organized, I feel that d3 authors suffer from the ‘IKEA syndrome’ or the ‘Not-Invented-Here’ syndrome, choosing to write even the most basic array manipulation library functions by themselves, instead of using any of the dozens out there. For eg. I really can’t fathom why a visualization API would feature:
- date manipulation utilties (
d3.time
) - data interpolation toolkit (it has most of something like numpy in there!)
- XHR utilties (seriously?!)
- date manipulation utilties (
- The code uses a lot of archaic, non-idiomatic JS which seems a bit out of place considering the use of ES6 constructs mixed in with it.
Leave a Comment