Friday 20 January 2012

Fast path member access

I noted on Wednesday that accessing members by repeatedly doing a lookup in a Vector is slower than doing that lookup once. But it was slower even than I expected. The code was doing the lookup eight times more than necessary, but was over twelve times slower (i.e. it took over twelve times as long). So I forked the code and re-wrote the tests to find out what was going on. The results of this are especially interesting.


The functions used are as follows:


function Test1():int {
    var iS:int, iT:int, u:Vec2D;
    iS = 0;
    for (iT = 0; iT < kNumTrials; iT++) {
        u = aData[iTrial];
        iS += u.iX;
    }
    return iS;
}

function Test2():int {
    var iS:int, iT:int;
    iS = 0;
    for (iT = 0; iT < kNumTrials; iT++) {
        iS += aData[iT].iX;
    }
    return iS;
}

The routines essentially do the same thing. Every time through the loop it gets a Vec2D from a Vector of them, then adds its iX member to a sum.

The difference is the first routine does an extra step: it stores the Vec2D (or a reference to it) in a local variable u before using it. That should be slower or at least no faster, as it's doing at least as much as the second routine must be doing. It's worth noting that aData is a Vector, so neither routine should have problems determining the type of the object being accessed.

Except the first routine is not slower, it's faster. It executes in about half the time of the second routine in my tests (the code is online and the application is embedded below). 

This was quite surprising, so to get some idea why it happens I tried some other tests. The most interesting, which is also included in the test application, replaces local variables with class members. This also takes twice as long, whether or not the Vec2D is copied before being used.

I don't really know why this is happening, although I have a theory. It is that the compiler is optimised to work with data in local variables. In a complex expression such as 

        iS += aData[iT].iX;

it needs to store an intermediate result but does so less efficiently than local variables. If a routine uses local variables for frequently accessed data, including intermediate results as above, then it runs much faster.

This is like optimisations seen when working with compiled languages such as C++, which can keep local variables in registers to save memory access. Except a modern C++ compiler would do a better job of optimising the intermediate expression; one of many things a modern C++ compiler does better than AVM2 it seems.

No comments:

Post a Comment