The dark art of Essbase optimization

I have been involved with Essbase in one way or another for the last 14 years. This involved doing Essbase development, implementations at clients in various industries, running implementation teams, project management, proof of concepts and sales demonstrations, product shootouts, support and quality control, in short just about anything you can do with Essbase. However I would say that easily half this time, if not more, was spent doing optimization of Essbase cubes.

I think one of the reasons Essbase optimization can be somewhat tricky, to put it mildly, is because Essbase cube performance is almost directly linked to the design of the cube, i.e. the dimensions, hierarchies and members in each dimension, stored vs. dynamic members, etc. This is unlike other (relational) systems where there are general technical items to check for optimization, which has nothing to do with the data in the system (except maybe for the amount of data). In Essbase the data dictates the cube structure, and the way the cube storage works in blocks and indexes, and therefore every cube is different and has different performance implications. Also Essbase is unique in that just increasing hardware specifications (specifically memory and CPU power), while causing some improvements as a matter of course, will not cause as dramatic performance improvement as changing the cube design to be optimal for the data set used.

This means that is can be difficult to apply the same process to optimize every Essbase cube you encounter. I think this is also why Essbase optimization is sometimes seen as something of a dark art, with mysterious figures cackling manically while tweaking dense and sparse settings randomly. While getting optimal performance per cube can certainly sometimes be more luck than anything else, in my experience with Essbase optimization there are certain common elements which should be checked first, and which will insure optimal performance in most cases. I will discuss some of these below, and please note this is applicable to block storage cubes, I will blog about ASO optimization in another post.

The Essbase optimization main items checklist (for block storage cubes)

  • Block size
    • Large block size means bigger chunks of data to be pulled into memory with each read, but might also mean more of the data you need is in memory IF your operations are done mainly in-block. Generally I prefer smaller block sizes, but there is not a specific guide. In the Essbase Admin Guide they say blocks should be between 1-100Kb in size, but nowadays with more memory on servers this can be larger. My experience is to make blocks below 50Kb but not less the 1-2Kb, but this is all dependent on the actual data density in the cube. Do not be afraid to experiment with dense and sparse settings to get to the optimal block size, I have done numerous cubes with just one dimension as dense (typically a large account dimension), and cubes where neither the account nor time dimension is dense. You will know you have good block size selection by looking at the next point, block density.
  • Block density
    • This gives an indication of the average percentage of each block which contains data. In general data is sparse, therefore a value over 1% is actually quite good. If your block density is over 5%, then your dense/sparse setting is generally spot-on. Look at this whenever you change dense and sparse settings in conjunction with block size to see if you settings are optimal. A large block with high density is OK, but large blocks with very low density (< 1%) not.
  • Cache settings
    • Never ever leave a cube with the default cache settings. Often a client complains about Essbase performance, and sure enough when I look at the cache settings it is the default settings. This is never enough (except for a very basic cube). Rule of thumb here is to see if you can get the entire index file into the cache, and make the data cache 3 times index cache, or at least some significant size. Also check your cube statistics to see the hit ratio on index and data cache, this gives an indication what % of time the data being searched is found in memory. For index cache this should be as close to 1 as possible, for data cache as high as possible.
  • Outline dimension order
    • Remember the hourglass principle. This means order the dimensions in your outline as follows – first put the largest (in number of members) dense dimension, then the next largest dense dimension, and continue until the smallest dense dimension. Now put the smallest sparse dimension, then next smallest, and continue until the largest sparse dimension. Because of the way the Essbase calculator works, this arrangement optimizes number of passes through a cube. A variation of this which also seems to work well is the hourglass on a stick, where you put all non-aggregating sparse dimensions (i.e. years,, scenario) beneath the largest sparse dimension.
  • Commit Block settings
    • This controls how often blocks in memory are written to disk while busy loading or calculating a cube. You want to minimize disk writes, as this takes up a lot of processing time, so set this to be quite high. The default setting is 3000 blocks, if your block size is relatively small (< 10KB) make this much higher, 20000 to 50000. This setting alone can cause dramatic performance improvements specifically on Calc All operations and cube loads.
  • Use of FIX..ENDFIX in calc scripts
    • One of the most misunderstood and common mistakes in calc scripts is the usage of FIX..ENDFIX. Always FIX first on sparse dimensions only, and then within this FIX again on dense members, or use dense members in IF statements within the FIX statements. The reason for this is that if you FIX only on sparse members first, it filters on just specific blocks, which is faster than trying to fix within blocks (i.e. dense members).
  • Optimizing data loads
    • The best technique to make large data loads faster is to have the optimal order of dimensions in your source file, and to sort this optimally. Do do this, order the fields in your source file (or SQL statement) by having as your first field your largest sparse dimension, your next field your next largest sparse dimension, and so on. So if you are using the hourglass dimension order, you data file should have dimensions listed from the bottom dimension upwards. Your dense dimensions should always be last, and if you have multiple data columns these should be dense dimension members. Then you should sort the data file in the same order, i.e. by largest sparse dimension, next largest sparse dimension, etc. This will cause blocks to be created and filled with data in sequence, making the data load faster and the cube less fragmented.

These are just the initial general optimization points which can cause huge performance improvements without too much effort, in a next post I will look at more advanced optimization techniques, but generally these ones should handle 70% of your optimization issues.