Writing Efficient Squirrel

How To Optimize Your Squirrel Code For Size And Performance

In Squirrel, there are often several ways of performing a task. However, it is often the case that some ways are more efficient than others, either in memory usage or in run-time performance — and each imp’s CPU has only limited reserves of each.

Use Blobs To Represent Arrays Of Bytes

Helps with memory usage, performance

Squirrel’s standard blob class is the best approach for preparing byte arrays, for instance to send data over UART or SPI interfaces. Here is a complete code example which uses a blob to write binary data to SPI:

// Strip of 16 WS2801 RGB LED drivers

local r = 0;
local g = 0;
local b = 0;

function timer() {
    imp.wakeup(0.01, timer);

    r = (r + 3) % 51000;
    g = (g + 11) % 51000;
    b = (b + 7) % 51000;

    local r3 = r / 100;
    local g3 = g / 100;
    local b3 = b / 100;

    local r2 = (r3 > 255) ? 510-r3 : r3;
    local g2 = (g3 > 255) ? 510-g3 : g3;
    local b2 = (b3 > 255) ? 510-b3 : b3;

    local out = blob(48);
    for (local i = 0 ; i < 16 ; i++) {
        out.writen(r2, 'b');
        out.writen(g2, 'b');
        out.writen(b2, 'b');
    }

    hardware.spi257.write(out);

    // WS2801 datasheet says idle for 500us to latch
    imp.sleep(0.001);
}

hardware.spi257.configure(SIMPLEX_TX, 15000);  // Datasheet says max 25MHz
hardware.pin5.configure(DIGITAL_OUT, 0);
imp.sleep(0.01);
hardware.pin5.write(1);
server.log("LED strip started...");
timer();

Use blobs in place of strings if you can because Squirrel strings are immutable and so manipulating them involves more memory turnover. However, you currently have to use strings for I²C, and strings remain a good option for constant byte arrays.

Similarly, don’t use Squirrel arrays to store bytes. These are really dictionaries of Squirrel objects and thus take substantially more than a byte of memory to store each one of your data bytes.

Here is an example of a wasteful byte array:

setupParams <- [0x55, 0x23, 0x11, 0xFF];

This is better represented like this:

const SETUP_PARAMS = "\x55\x23\x11\xFF";

Both of these can be indexed with [] and iterated over with foreach if needed:

foreach (i in setupParams) { ... }

foreach (i in SETUP_PARAMS) { ... }

Arrays Are More Efficient Than Tables

Helps with memory usage, performance

Arrays are smaller and faster than tables. If you have a lot of data, the difference can be enormous: one agent whose central data structure was a 3000-entry array of three-entry tables kept running out of memory, because the structure took up 720KB. Switching it around, to a three-entry table of 3000-entry arrays, reduced its memory usage to just 150KB.

Choose Between Classes And Tables

Helps with memory usage

If you’re only going to instantiate your class once, then it takes less memory to use a table instead. Conversely, if you need two or more instances, then it takes less memory to use class instances.

// Efficient if there's only one ioexpander
ioexpander <- {
    function write(x, y, z) { ... }
}

// Efficient if you need two or more
class IOExpander {
    function write(x, y, z) { ... }
}

ioexpander1 <- IOExpander(hardware.i2c12);
ioexpander2 <- IOExpander(hardware.i2c89);

Choose Between Top-level Locals And Globals...

Helps with memory usage, code size

Variables declared local at the top level of a program, ie. apparently not within any function, are in fact local to a notional main() function that corresponds to the main body of the program. They look like this:

local r = 0, g = 0, b = 0;

If you have a large number of such top-level locals, they can take up extra run-time memory and generate a bigger code footprint than do members of the root table — global variables — which look like this:

r <- 0 ; g <- 0 ; b <- 0;

However, using locals will be faster — and may in future become faster still as this approach makes optimization of the code easier. Locals are also discarded once no part of the program references them, whereas globals are retained throughout the run-time of the program. This means that, for instance, local functions only used during your program’s set-up will be automatically discarded from run-time memory once they are no longer needed.

...But In Functions, Local Is Faster Than Global

Helps with performance

Local variables in functions are looked up by index, not by string hash. This means they are much faster than global variables (or class members, or any other sort of variable). So this:

f <- "";
function readSerial() {
    for (local i = 0; i < 1000 ; ++i) {
        f += hardware.uart1289.read().tochar();
    }
}

is slower than this:

f <- "";
function readSerial() {
    local f2 = "";
    for (local i = 0; i < 1000 ; ++i) {
        f2 += hardware.uart1289.read().tochar();
    }
    f = f2;
}

Cache Expensive Look-ups

Helps with performance

Squirrel tables, including the root table, are hash tables indexed by a hash of the key. If you’re performing the same operation repeatedly, it’s faster to do the look-up just once. This code:

b <- blob(1000);
for (local i = 0 ; i < 1000 ; ++i) {
    // Chip select is on pin8
    hardware.pin8.write(0);
    b[i] = hardware.spi257.readblob(1)[0];
    hardware.pin8.write(1);
}

runs much more slowly than this:

b <- blob(1000);
cs <- hardware.pin8;
myspi <- hardware.spi257;
for (local i = 0 ; i < 1000 ; ++i) {
    // Chip select is on pin8
    cs.write(0);
    b[i] = myspi.readblob(1)[0];
    cs.write(1);
}

The following code is even faster, by assigning a variable to the actual function to be called directly. Note that this has to be done in two steps, so you can bind the function, cswriter(), to its context, hardware.pin8.

b <- blob(1000);
cs <- hardware.pin8;
cswriter <- cs.write.bindenv(cs);
myspi <- hardware.spi257;
myspireader <- myspi.readblob.bindenv(myspi);
for (local i = 0 ; i < 1000 ; ++i) {
    // Chip select is on pin8
    cswriter(0);
    b[i] = myspireader(1)[0];
    cswriter(1);
}

The Best Form Of A Constant Is A const

Helps with memory usage, code size, performance

If there’s a value that you’d like to name for clarity’s sake, but which never changes, the best thing to do is make it a Squirrel constant:

const I2C_ADDR_IOEXPANDER = 0x7C;

Such constants are substituted at compile-time, and become integer literals in the compiled Squirrel code — ie. they are both compact and quick to access. Note that the substitution takes places at the line containing the const statement, not at the first inclusion of the constant’s name. So this code will fail:

function getAddress() {
    return I2C_ADDR_IOEXPANDER;
}

const I2C_ADDR_IOEXPANDER = 0x7C;

To avoid this, place your constant declarations as early in your code as possible.

This style of constant declaration is less efficient:

I2C_ADDR_IOEXPANDER <- 0x7C;

as it requires a table look-up at run-time. This approach is less efficient still:

local I2C_ADDR_IOEXPANDER = 0x7C;

as Squirrel goes out of its way to share the value among all the functions which use it. But even this is more efficient than this code:

class IoExpander {
    static I2C_ADDR = 0x7C;
    // . . .
};

This use of a static variable — one shared among all the instances of the class — adds to the somewhat complex Squirrel data structures representing the class type, and takes up much run-time memory. Static class members may be just like normal variables in C++, but in Squirrel, they’re not!

Give The Compiler A Helping Hand

Helps with code size, performance

With many languages, you’ll hear it said, ‘Don’t try to outsmart the compiler’. When using Squirrel, that advice is less good — it’s currently all too possible to be smarter than the compiler. This means that your Squirrel code can sometimes be smaller and run faster if you optimize your code manually, in ways which modern C++ or JavaScript implementations might have done for you:

Dead-code elimination — If there’s a variable, method or function that you never use, delete it from your code (or wrap it in a comment).
Constant folding — If you use an expression to specify a constant, such as blob(8*1024), Squirrel will do the multiplication at run-time. Instead, do the calculation yourself and put the result in your code: in this case, blob(8192).
Common sub-expression elimination — If your code does the same calculation, or partial calculation, several times in a row, then instead calculate it just once and reuse the value.
Hoisting — If you’ve got a loop (for, foreach or while) which is executed lots of times, try and move as much work as possible outside the loop (as seen above under Cache Expensive Lookups).
Long names (for functions, variables and fields) really do take up more space than short ones, especially if they’re used in several different functions.

These techniques can make your code less readable and maintainable, so it’s only worth applying them when necessary.

They all, of course, also act as suggestions to us about how we could improve the Squirrel compiler in the future. We hope that, over time, the optimizations discussed above will one-by-one vanish from this page!