Tuesday, June 1, 2010

Optimizing Spreadsheet Operations

As you probably expect, most scripts read from or write to a spreadsheet, so we thought it'd be valuable to share some best practices for making spreadsheet automation as fast and efficient as possible.

Let's start with a general principle: JavaScript operations are an order of magnitude faster than talking to any services. Put another way, anything that Apps Script can do by itself will be exceedingly fast, while talking to Spreadsheets, Translate, Sites, etc, will be measurably slower. The best advice here is to call services sparingly - the least expensive spreadsheet operation is no spreadsheet operation!

Obviously though the entire purpose of Apps Script is to act as glue between all of these services, so how can keep things fast and efficient without sacrificing functionality? Well, there's really just one golden rule:
Don't make several calls when you can do it in one.
We'll talk a little about the functions available to make this easier, but first let's explain why this is the case. The obvious reason is that the calls are relatively expensive, so making fewer calls is naturally going to improve performance. Secondly, if you can batch your calls (more on batching in a moment), it means that you're less likely to interleave reads and writes (gets and sets).

Why is interleaving sets and gets so bad? Knowing that calls to the spreadsheet are relatively slow, we do some look ahead caching, and some write caching. Chances are that if you read (get) one cell then you're quite likely to read another one close by, relatively soon. Likewise, if you write (set) one cell then you're also likely to write another one relatively soon. Think about any loops in your code - you may iterate over spreadsheet rows, or columns for example, and the caches will really help you here.

Every time you do a read, we must first empty (commit) the write cache to ensure that you're reading the latest data (you can force a write of the cache by calling (SpreadsheetApp.flush();). Likewise, every time you do a write, we have to throw away the read cache because it's no longer valid. Therefore if you can avoid interleaving reads and writes, you'll get full benefit of the cache.

We've explained that calling services is expensive, and that you can gain the maximum benefit of the cache by grouping reads and writes. A logical next step is to use the batch operations in the Spreadsheet API. Most set and get functions have 'batch' versions, for example there's a getValue/setValue pair, and there's a corresponding getValues/setValues pair.

These batch functions operate on a range of cells at once, and you only pay the cost of a single service call even though there's a lot more data being transferred. Additionally, by organizing your code to maximize use of batch functions you are also avoid the interleaved reads and writes that can ruin cache performance - a rare double win!

To illustrate the benefits of the batch setters over operating cell by cell, let's take a real script example. We've chosen the gallery script "Spreadsheet Fractal Art" as our optimization candidate, but the same techniques can be applied to most scripts. This particular script sets the background colors of every cell in a 100x100 grid, so it is perfect for batch writing - here's the original loop that writes colors to the sheet:

var cell sheet.getRange('a1');
for (var 0100y++{
  xcoord xmin;
  for (var 0x < 100x++{
    var getColor_(xcoordycoord);
    cell.offset(yx).setBackgroundColor(c);
    xcoord += xincrement;
  }
  ycoord -= yincrement;
  SpreadsheetApp.flush();
}


You can see that we loop over 100 rows and 100 columns, writing a cell color at each location for a total of 10,000 calls to the spreadsheet. Our write back cache does help a little - in fact we force a write back using flush at the end of every line, so there are really only 100 calls to the spreadsheet owing to the good caching. In our rough tests, calling 'plot' as written takes about 1 minute and 10 seconds.

We can re-write the code to make use of the batch version of setBackgroundColor:

var cell sheet.getRange('a1');
var colors new Array(100);
for (var 0y < 100y++{
  xcoord xmin;
  colors[ynew Array(100);
  for (var 0x < 100x++{
    colors[y][xgetColor_(xcoordycoord);
    xcoord += xincrement;
  }
  ycoord -= yincrement;
}
sheet.getRange(11100100).setBackgroundColors(colors);


In our rough tests, this one clocks in at around 1 second. The timer isn't accurate enough to tell us if it is a full 100x faster, but even if it is only a 60x improvement it is still a huge benefit gained simply by playing to the strengths of the system.

If you have any great performance optimizing tips, or code that is running very slowly, we encourage you to post in the Apps Script support forum -- and we'll feature some of the best tips in a future post.





Note: we made a minor modification to the "Spreadsheet Fractal Art" script to make this example easier to follow.  The script as published uses the setBackgroundRGB call, rather than setBackgroundColor you see above - thus the getColor_ function was changed as follows:
if (iteration == max_iteration{
   return "#000000";
else {
   var 255 (iteration 5);
   Math.min(255Math.max(0c));
   var hex Number(c).toString(16);
   while (hex.length 2)
     hex "0" hex;
   
   return ("#"+hex+"3280");
}

4 comments:

  1. Nice post, very helpful. Would you expand on clearing write and read cache part?

    ReplyDelete
  2. Tell me... is;

    SpreadsheetApp.openById("mySpreadsheet").getSheetByName("mySheet");

    faster than;

    var ss = SpreadsheetApp.openById("mySpreadsheet");
    var sheet = ss.getSheetByName("mySheet");

    In other words, does chaining the function calls count as one single call to the service? Or do they still count as individual function calls?

    If it makes no difference, then there are cases where I'd prefer to do it the 2nd way, to make the code easier to read and more modular.

    ReplyDelete