Tuesday, November 01, 2011

MySQL Thread Pool: Storage Engines

I got a question from the NDB folks that are currently adapting MySQL Cluster to
MySQL 5.5 about whether any special developments are needed to adapt the NDB
storage engine for use with the thread pool. Then I realised there are more
people out there that write storage engines that want to know how to optimise
their storage engines for the thread pool.

So first of all any storage engine will work with the thread pool as they are today
without any modifications. It is however possible to improve the performance of the
MySQL Server when using the thread pool by adapting the storage engine to the
thread pool APIs.

The new API that has been added to the MySQL 5.5 server is the thd_wait interface.
This interface makes it possible for storage engines to report to a thread pool
plugin before starting a wait and after finishing a wait.

As an example, we have adapted the InnoDB storage engine by adding the thd_wait
interface calls around row locks in InnoDB and before file IO due to misses in
the InnoDB buffer pool. The InnoDB code have also been changed to make those
callbacks as part of the implementation of the --innodb-thread-concurrency and
when waiting for flushes of the buffer pool as part of checkpoints and other
activities where writes are required to ensure proper operation of InnoDB.

The NDB storage engine has very different reasons for the waits, the NDB storage engine
implements the actual data management in the NDB data nodes (these nodes runs in
separate processes separate from the MySQL Server), thus the only reason for waits
in the MySQL Server is when we're waiting for packets to return from the NDB data nodes.

Most third-party storage engines probably fit fairly well with InnoDB and/or NDB in how
they are integrated with the thread pool plugin. So there are storage engines that
perform all the work inside the MySQL Server. The more advanced such engines are likely
to also have a buffer pool and thus should consider calling the thd_wait interface
when doing IO, these storage engines are also likely to acquire row locks or some
similar level of data lock that sometimes will require an extended wait. There are
also other storage engines that are distributed in nature such as NDB, these
storage engines will want to make the callbacks to the new thread pool API when
waiting for responses on the network.

For storage engines that implement some data structure similar to the THD object in the
MySQL Server, there is one additional thing to consider. When using a thread pool it
makes sense to consider pooling such objects given that the thread pool
will pool threads. As an example, we have such an object called Ndb in the NDB API that
has the potential to be pooled. The benefits of pooling such objects are that it
means less time to create them, less memory usage and thus fewer CPU cache misses
due to their usage.

The thd_wait interface is really simple. It contains two calls thd_wait_begin and
thd_wait_end. Both calls have the THD object as the first parameter. Often the THD
object isn't known in the storage engine code when needed. In this case one simply
uses NULL as the THD object. The thd_wait interface can even handle the case where
the thd_wait interface is used from threads that are private to the storage engine.
The thread pool will discover that there is no THD object attached to the thread
and ignore the call.

The thd_wait_begin call also have a second parameter that specifies the type of
wait that will show up in the thread pool information schema tables. There
will be statistics on waits per type. There are currently 10 wait types.

To see an example of usage of this interface, search for thd_wait in the InnoDB
storage engine source code in the MySQL 5.5 community server.

The MyISAM storage engine does not use this API because MyISAM relies on the
MySQL Server for locking. Also, MyISAM assumes that the OS takes care of caching
of pages. This means that there is a very high probability that writes to the
file system are handled directly in the file system cache without involving any
long waits.

What is effect of not modifying a storage engine to implement the thd_wait
interface? The thread pool operates by trying to always have one thread active
per thread group. If the active thread is blocked and the thread pool is informed
of the block, then the thread pool can start another thread to ensure that the
thread group is being efficiently used. If the storage engine is not modified to
implement the thd_wait interface, the thread pool is not informed of the block.
In this case, the thread group will be blocked for a while until the wait is
completed or until the query is defined as stalled. The throughput of the system
can to some extent be handled in those cases by increasing the number of thread
groups.

So implementing the thd_wait interface means better throughput and also less
variance of the throughput and waiting times.

To use these interfaces in a file, include two header files (the thd_wait interface is
part of the plugin APIs in the MySQL 5.5 community and commercial servers).

#include "mysql/plugin.h"
#include "mysql/service_thd_wait.h"

Below is the most important information in these header files.

typedef enum _thd_wait_type_e {
THD_WAIT_SLEEP= 1,
THD_WAIT_DISKIO= 2,
THD_WAIT_ROW_LOCK= 3,
THD_WAIT_GLOBAL_LOCK= 4,
THD_WAIT_META_DATA_LOCK= 5,
THD_WAIT_TABLE_LOCK= 6,
THD_WAIT_USER_LOCK= 7,
THD_WAIT_BINLOG= 8,
THD_WAIT_GROUP_COMMIT= 9,
THD_WAIT_SYNC= 10,
THD_WAIT_LAST= 11
} thd_wait_type;
void thd_wait_begin(MYSQL_THD thd, int wait_type);
void thd_wait_end(MYSQL_THD thd);

THD_WAIT_SLEEP: For uninterrupted sleeps.
THD_WAIT_DISKIO: For file IO operations that are very likely to cause an actual
disk read.
THD_WAIT_ROW_LOCK: For row locks/page locks in the storage engine.
THD_WAIT_GLOBAL_LOCK: For global locks such as the global read lock in the MySQL
Server.
THD_WAIT_TABLE_LOCK: When waiting for a table lock.
THD_WAIT_META_DATA_LOCK: For waiting on a meta data lock which isn't a table lock.
THD_WAIT_USER_LOCK: For some type of special lock.
THD_WAIT_BINLOG: When waiting for the replication binlog.
THD_WAIT_SYNC: When waiting for an fsync operation.

It's quite likely we will introduce more wait types, such as the wait on the network.