Friday, May 15, 2009

Shootout of split page hash from InnoDB buffer pool mutex

One of the hot mutexes in InnoDB is the buffer pool mutex.
Among other things this mutex protects the page hash where
pages reside when they are in the cache.

There is already a number of variants of how to split out
this mutex. Here follows a short description of the various
approaches.

1) Google v3 approach
Ben Hardy at Google took the approach of using an array of
mutexes (64 mutexes) and this mutex only protects the
actual read, insert and delete from the page hash table.
This has the consequence of a very simple patch, it means
also that when the block has been locked one has to check
that the owner of the block hasn't changed since we didn't
protect the block between the read of the hash and the
locking of the block, thus someone is capable of coming in
between and grabbing the block for another page before we
get to lock the block. In addition this patch focuses
mainly on optimising the path in the buf_page_get_gen
which is the routine used to get a page from the page
cache and thus the hot-spot.

2) Percona approaches
Percona has done a series of approaches where the first
only split the page hash as one mutex and still protecting
the blocks from being changed while holding this mutex.
Next step was to change the mutex into a read-write lock.

3) My approach
My approach was inspired by Percona but added two main
things. First it split the page hash into a number of
page hashes and had one RW-lock per page hash (this
number has been tested with 4, 8 and 16 and 4 was the
optimal on Linux at least). In addition to avoid having
to lock and unlock multiple pages while going through
the read ahead code the hash function to decide which
page hash to use decided on the same page hash for all
pages within 1 MByte (which is the unit of read ahead
in InnoDB).

Pros and Cons

The simplest patch is the Google patch which makes for
a very simple patch and also by only focusing on
buf_page_get_gen avoids a lot of possible extra traps
that are likely if one tries to solve too much of the
problem.

Using a RW-lock instead of a mutex seems like at least
a manner of improving the concurrency but could of
course impose a higher overhead as well so here
benchmarking should show which is best here.

When using an array of locks it makes sense to optimise
for read ahead functionality since this is a hot-spot
in the code as has been shown in some blogs lately.

4) Mixed approach
So a natural solution is then to also try a mix of the
Google variant with my approach. So still using an
array of locks (either mutex or RW-locks, whatever
has the optimal performance) but ensuring that the
pages within a read ahead area is locked by the same
lock.

This approach reuses the simplicity of the Google
approach, the total lack of deadlock problems for
the Google approach with the optimised layout from
my approach and the idea of RW-locks from Percona.

We don't have any results of this shootout yet.
This shootout should also discover the optimum number
of areas to split the page cache into, Google has
used 64, but my results so far indicates a number of
4 seems more appropriate.

No comments: