Monday, July 14, 2008

Feeling Researchy

For some reason, lately I'm feeling a mood I can only describe as "researchy". Related trivia that caught my eye lately... I ran across a research group at UC Berkeley and attended one of their public seminars; found that former co-worker Kevin Stoodley from the IBM Toronto Lab is an IBM fellow; and reminisced over some history on IBM's research group, with extra acronyms thrown in.

Saturday, July 12, 2008

Javascript Performance Considered as a Helix of Semi-Precious Hash Tables

I had a discussion about Javascript performance the other day, which reminded me that the more things change, the more they stay the same. Some of the things casual Javascript programmers are grappling with today, were big research issues for IBM's
TOBEY compiler technology a few years ago. (And maybe still, for all I know.)

Let's start with the basics. Whenever you're referencing objects in Javascript, you're getting pointers, not deep copies. So:

var x = a.b.c.d.e;
x.something = 0;
x.something_else = 'foo';

is actually modifying attributes of the original a.b.c.d.e object.

Intuitively, you can see that you're saving bytes in the program, which means less time to transmit the script file, less time to parse, and saving space in the browser cache which could avoid re-fetching some page, image, or other resource later. But there are performance implications at a deeper level too.

Javascript's object orientation is like Perl's, based all on hash tables, except it doesn't hit you over the head with that fact. So x.y.z involves looking up the key 'y' in x's hash table, which returns another hash table, in which you look up the key 'z'.

Array notation and object notation are effectively interchangeable. x.foo works the same as x['foo']. x[0], x[1], x[2], etc. are not necessarily contiguous in memory, they're just entries in a hash table whose keys happen to be integers. Looping through an array involves a hash table lookup each time, not just incrementing a pointer.

The scoping and closure features in Javascript, plus its interpreted nature, mean it has to do more lookups to figure out how to resolve each variable reference. For example:

function foo()
{
var a, x;

function bar()
{
var a, y;

for (...)
{
var a, z;

a = 1;
// a is found in current scope, requiring just one hash table lookup.
x = 'hello';
// x is not found in current scope;
// x is not found in parent scope;
// x _is_ found in grandparent scope.
// Meaning 3 hash table lookups to figure out where x is.

When all these slight variations in efficiency get put inside loops or frequently called functions, that's when the performance can start to drag. That's why you see people doing things like:

for (var i = 0, limit = some_object.length; i < limit; i++) ...

If i is declared outside the loop, each reference to it to test or increment means having to find it in some outer scope. If some_object.length is referenced in the loop bounds test, each time through the loop has to locate some_object in the right scope, plus do another hash table lookup to retrieve the length property. The length property could be modified by code inside the loop, so even though you consider some_object.length to be a constant value, Javascript can't assume the same.

Getting back to the notion of our first example, when you store a pointer to some deeply nested object and refer directly to that object, multiple times:

var it = x.y.z.style;
it.marginTop = "0em";
it.marginBottom = "1em";
it.marginLeft = "1em";
it.marginRight = "2em";

you're saving all those lookups, both for scope and traversing the hash tables of object members, for each reference.

At the DOM level, you've got a property element.childNodes and a method element.hasChildNodes(). Seems intuitive that checking for the existence of child nodes should be faster than returning the actual nodes, right?

Remember, childNodes is just giving you a pointer to a data structure that is being maintained continuously, not making a copy of anything, or assembling the structure on-demand. So the overhead in each case is constant. That data structure could be empty, which is why the has*() method could be useful. There's a distinction sometimes between "this object exists" and "this object contains something useful", so you wind up doing things like:

els = document.getElementsByTagName('a');
if (els && els.length > 0) ...
// Must test that the assignment succeeded,
// _and_ that what came back wasn't an empty structure