Wednesday, July 05, 2006

PARTITION by a date column

One of the most common usage of partitioning is where one wants to partition
by date. One might want to have one partition per year or per month or per
week or per day. This blog entry shows how to handle this requirement using
MySQL 5.1.

The most common method to partition in this case is by range.
Partitioning in 5.1 uses a function on one or more fields of the table. In 5.1
there are some requirements on these fields if unique indexes or primary keys
also exist in the table. The reason is that 5.1 doesn't have support for global
indexes. Development of support for this have however started so should be in
some future release of MySQL.

In 5.1 functions have to return an integer. There are two functions that has
special support in the MySQL server. These are TO_DAYS() and YEAR(). Both
of these functions take a DATE or DATETIME argument and return an integer.
YEAR() returns the year and TO_DAYS() returns the number of days passed
since a particular start date.

The MySQL optimizer has special support for these two partition functions. It
knows that those functions are strictly increasing and use this knowledge to
discover that queries such as:

SELECT * from t1 WHERE a <= '1991-01-31' AND a >= '1991-01-01';
with a partition function PARTITION BY RANGE (to_days(a)) can be mapped
to a range of partition function values starting at
TO_DAYS('1991-01-01') and ending at TO_DAYS("1999-01-31")

Thus the MySQL Server can map TO_DAYS('1991-01-01') to a starting partition
and TO_DAYS('1991-01-31') to an ending partition. Thus we only need to scan
partitions in a range of partitions.

Most functions don't have this nice mapping from value range to partition
range. The functions TO_DAYS(date) and YEAR(date) are known by the
MySQL optimizer to have this attribute and they will thus be better for range
optimisations. Also a partition function on a field which is an integer field
where the function is the field by itself will have this characteristic. Other
functions won't, theoretically many more can be handled but this requires
special care of overflow handling to be correct and this will be added in
some future MySQL release.

So with this knowledge let's set up a that does partition by month.

CREATE TABLE t1 (a date)
PARTITION BY RANGE(TO_DAYS(a))
(PARTITION p3xx VALUES LESS THAN (TO_DAYS('2004-01-01'),
PARTITION p401 VALUES LESS THAN (TO_DAYS('2004-02-01'),
PARTITION p402 VALUES LESS THAN (TO_DAYS('2004-03-01'),
PARTITION p403 VALUES LESS THAN (TO_DAYS('2004-04-01'),
PARTITION p404 VALUES LESS THAN (TO_DAYS('2004-05-01'),
PARTITION p405 VALUES LESS THAN (TO_DAYS('2004-06-01'),
PARTITION p406 VALUES LESS THAN (TO_DAYS('2004-07-01'),
PARTITION p407 VALUES LESS THAN (TO_DAYS('2004-08-01'),
PARTITION p408 VALUES LESS THAN (TO_DAYS('2004-09-01'),
PARTITION p409 VALUES LESS THAN (TO_DAYS('2004-10-01'),
PARTITION p410 VALUES LESS THAN (TO_DAYS('2004-11-01'),
PARTITION p411 VALUES LESS THAN (TO_DAYS('2004-12-01'),
PARTITION p412 VALUES LESS THAN (TO_DAYS('2005-01-01'),
PARTITION p501 VALUES LESS THAN (TO_DAYS('2005-02-01'),
PARTITION p502 VALUES LESS THAN (TO_DAYS('2005-03-01'),
PARTITION p503 VALUES LESS THAN (TO_DAYS('2005-04-01'),
PARTITION p504 VALUES LESS THAN (TO_DAYS('2005-05-01'),
PARTITION p505 VALUES LESS THAN (TO_DAYS('2005-06-01'),
PARTITION p506 VALUES LESS THAN (TO_DAYS('2005-07-01'),
PARTITION p507 VALUES LESS THAN (TO_DAYS('2005-08-01'),
PARTITION p508 VALUES LESS THAN (TO_DAYS('2005-09-01'),
PARTITION p509 VALUES LESS THAN (TO_DAYS('2005-10-01'),
PARTITION p510 VALUES LESS THAN (TO_DAYS('2005-11-01'),
PARTITION p511 VALUES LESS THAN (TO_DAYS('2005-12-01'),
PARTITION p512 VALUES LESS THAN (TO_DAYS('2006-01-01'),
PARTITION p601 VALUES LESS THAN (TO_DAYS('2006-02-01'),
PARTITION p602 VALUES LESS THAN (TO_DAYS('2006-03-01'),
PARTITION p603 VALUES LESS THAN (TO_DAYS('2006-04-01'),
PARTITION p604 VALUES LESS THAN (TO_DAYS('2006-05-01'),
PARTITION p605 VALUES LESS THAN (TO_DAYS('2006-06-01'),
PARTITION p606 VALUES LESS THAN (TO_DAYS('2006-07-01'),
PARTITION p607 VALUES LESS THAN (TO_DAYS('2006-08-01'));

Then load the table with data. Now you might want to see the
data from Q3 2004. So you issue the query:
SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';
This should now only scan partition p407, p408, p409. You can
check this by using EXPLAIN PARTITIONS on the query:
EXPLAIN PARTITIONS SELECT * from t1
WHERE a >= '2004-07-01' AND a <= '2004-09-30';

You can also get similar results with more complicated expressions.
Assume we want to summarize on all measured Q3's so far.
SELECT * from t1
WHERE (a >= '2004-07-01' AND a <= '2004-09-30') OR
(a >= '2005-07-01' AND a <= '2005-09-30');

Using EXPLAIN PARTITIONS we'll discover the expected result that this
will only scan partitions p407, p408, p409, p507, p508 and p509.

When july comes to its end it is then time to add a new partition for
august 2006 which we do with a quick command:
ALTER TABLE t1 ADD PARTITION
(PARTITION p608 VALUES LESS THAN (TO_DAYS('2006-09-01'));

6 comments:

Anonymous said...

This works perfectly for partitioning per month but is limited for a partition per day.

The max number of partition authorized by MySQL is 1024 (see Mysql manual: "16.5 Restrictions and limitations on MySQL partitioning"). Therefore a partition per day could only be created on less than three years(3*365=1095).

Sebastien Chaoulli
schaoulli@yahoo.fr

Anonymous said...

please correct your script:

CREATE TABLE t1 (a date)
PARTITION BY RANGE(TO_DAYS(a))
(
PARTITION p3xx VALUES LESS THAN (TO_DAYS('2004-01-01')),
PARTITION p401 VALUES LESS THAN (TO_DAYS('2004-02-01')),
PARTITION p402 VALUES LESS THAN (TO_DAYS('2004-03-01')),

...

Anonymous said...

Having to have my partition column be in every single unique key was causing me pain on a table which had an auto_increment column that I couldn't do without, where I wanted to partition on a date field. I thought I was screwed until I discovered that I could simply extend my primary key definition to include the date column along with the auto_increment:

CREATE TABLE stuff
(
id integer auto_increment,
gmt_day integer,
primary key(id, gmt_day)
)
PARTITION by range(gmt_day)
(
PARTITION older values less than (13756),
PARTITION sep07 values less than (13786),
PARTITION newer values less than maxvalue
);

This may or may not work for you, but it certainly got me out of a spot of bother.

5.1.22-rc

Kumar Narayanan said...

Hi Mikael
I am looking for a disk based mysql cluster for my old school that can use some economic hardware and also where speed is not exactly a need...is there a how to that you would recommend for this??

Abbas Ahmed said...

Hi Kumar,

Can you detail what kind of hardware is available to you?

Abbas

Anonymous said...

I searched every resource on the net, even mysql official site. But I couldn't found any solution for error "#1503 - A PRIMARY KEY must include all columns in the table's partitioning function"

But your explanation helped to me finally. Thank you very much!..