Hello hackers,
I've been playing with partitioning on GPDB7 with mixed flavors of
grammar: upstream Postgres grammar and legacy GPDB grammar. I have the
following findings and I'm looking for your thoughts.
Historically[1] Greenplum didn't allow a partitioned table to have
zero child. This guard exists in gpdb6 and all prior versions for
CREATE TABLE and ALTER TABLE DROP PARTITION.
For instance, the following commands ERROR in GPDB6:
-- ERROR: no partitions specified at depth 1
create table foo (a int, b int) distributed by(a) partition by range(b);
-- This works
create table foo (a int, b int) distributed by(a) partition by range(b) (default partition def);
-- ERROR: cannot drop partition "def" of relation "foo" -- only one remains
-- HINT: Use DROP TABLE "foo" to remove the table and the final partition
alter table foo drop partition def;
In contrast, upstream Postgres does not have this restriction. A
partitioned table can be created with zero child, and the last child
partition can be dropped or detached from a parent table.
For instance, the follow commands work in Postgres:
create table foo (a int, b int) distributed by(a) partition by range(b);
-- PG-style equivalent of ALTER TABLE ADD PARTITION
create table foo_def partition of foo default;
-- PG-style equivalent of ALTER TABLE DROP PARTITION
drop table foo_def;
-- PG-style command similar to ALTER TABLE DROP PARTITION, but
-- foo_def will continue to exist as a table outside of the partitioning hierarchy
alter table foo detach partition foo_def;
Currently in GPDB7, we keep Greenplum's historical behavior for legacy
GPDB syntax as much as we could, and adopt the Upstream behavior for
Postgres syntax. Because GPDB7 supports both flavors of partitioning
grammar, it is currently really inconsistent with whether or not to
allow a parent partitioned table to have zero children.
Below is a summary of relevant GPDB7 commands.
create table foo (a int, b int) distributed by(a) partition by range(b);
postgres=# alter table foo add partition foo_p1 start (1) end (3);
ERROR: XX000: GPDB add partition syntax needs at least one sibling to exist (tablecmds_gp.c:1212)
LOCATION: ATExecGPPartCmds, tablecmds_gp.c:1212
create table foo_p1 (like foo);
alter table foo attach partition foo_p1 for values from (1) to (3);
postgres=# alter table foo drop partition for (1);
ERROR: 2BP01: cannot drop partition "foo_p1" of "foo" -- only one remains
HINT: Use DROP TABLE "foo" to remove the table and the final partition
alter table foo detach partition foo_p1;
create table bar (a int, b int, c int) partition by list (b);
create table bar_p1 partition of bar for values in (1,2) partition by range(c);
-- ALTER TABLE SET SUBPARTITION TEMPLATE ERRORs if the grandparent/parent table has zero child.
-- This is an scenario we wouldn't encounter if user has only used legacy pre-GPDB6 syntax.
postgres=# alter table bar set subpartition template (subpartition sp1 start (1) end (3));
ERROR: XX000: GPDB SET SUBPARTITION TEMPLATE syntax needs at least one sibling to exist (tablecmds_gp.c:1358)
LOCATION: ATExecGPPartCmds, tablecmds_gp.c:1358
While I was working on
https://github.com/greenplum-db/gpdb/pull/14211, I noticed two things:
We don't have enough test coverage for mixed usage of Postgres and
legacy GPDB style partitioning syntax.
GPDB7's ALTER TABLE SET SUBPARTITION TEMPLATE and ALTER TABLE ADD
PARTITION commands are "luck-based" for a heterogeneous partitioning
hierarchy that can be only created using Postgres syntax.
Consider the following example:
-- Create top level parent
CREATE TABLE heteropart (id int, year date, letter char(1)) DISTRIBUTED BY (id, letter, year) PARTITION BY list (letter);
-- Add a level 1 partition that is NOT PARTITIONED. This for now is the FIRST level 1 partition.
CREATE TABLE heteropart_c PARTITION OF heteropart FOR VALUES IN ('C');
-- Add a level 1 partition that is partitioned BY RANGE with ZERO child. This now becomes the FIRST level 1 partition.
CREATE TABLE heteropart_b PARTITION OF heteropart FOR VALUES IN ('B') PARTITION BY range (year);
-- Add a level 2 child of heteropart_b.
CREATE TABLE heteropart_b_sp1 PARTITION OF heteropart_b FOR VALUES FROM (date '2021-01-01') TO (date '2022-01-01');
-- Add a level 1 partition that is partitioned BY LIST with ZERO child. This now becomes the FIRST level 1 partition.
CREATE TABLE heteropart_a PARTITION OF heteropart FOR VALUES IN ('A') PARTITION BY list (year);
-- Add a level 2 child of heteropart_a.
CREATE TABLE heteropart_a_sp1 PARTITION OF heteropart_a FOR VALUES IN (date '2022-01-01');
**You'd probably wonder how we decide who is the first sibling
partition: It depends on the partition bounds. In the above example,
the level 1 partitions are sorted by the list bound values, hence
heteropart_a
(with VALUES IN ('A')) is the first partition, followed
by heteropart_b
(with VALUES in 'B'), and then heteropart_c
(with
VALUES in 'C'). (Refer to qsort_partition_list_value_cmp()
and
qsort_partition_rbound_cmp()
for more details.)
In the above example, the partitioning hierarchy is heterogeneous in
the sense that the sibling partitions on level 1 could be a leaf
table, a range partitioned table, or a list partitioned table, and the
portioned tables could also have zero children.
Now if we add a new partition using the legacy GPDB syntax:
-- ERROR: 42P16: no partitions specified at depth 2
ALTER TABLE heteropart ADD PARTITION e VALUES ('E');
This ERRORed because according to the FIRST level 1 sibling partition
(heteropart_a
), we needed a level 2 subpartition spec. Even though
we already have heteropart_c
who does not have one.
SET SUBPARTITION TEMPLATE suffers similar problem:
-- ERROR: 42P16: invalid boundary specification for LIST partition
ALTER TABLE heteropart SET SUBPARTITION TEMPLATE(
subpartition r1 START (date '2001-01-01') END (date '2003-01-01'),
subpartition r2 START (date '2003-01-01') END (date '2005-01-01') EVERY (interval '1 year')
);
This ERRORed because according to the FIRST level 1 sibling partition
(heteropart_a
), the partition strategy is LIST. Even though we
already have heteropart_b
who is partitioned by RANGE.
Now let's actually respect the FIRST level 1 partition
(heteropart_a
) and set a subpartition template:
-- this works
ALTER TABLE heteropart SET SUBPARTITION TEMPLATE(
subpartition r1 VALUES (date '2001-01-01')
);
-- this also works
ALTER TABLE heteropart ADD PARTITION e VALUES ('E');
Now DROP the current FIRST level 1 partition heteropart_a
.
DROP TABLE heteropart_a;
Now that heteropart_b
becomes the FIRST level 1 partition. The
subpartition template becomes unusable and we can no longer ADD new
partitions unless we remove/reset the template.
-- ERROR: 42P16: invalid boundary specification for RANGE partition
ALTER TABLE heteropart ADD PARTITION f VALUES ('F');
Therefore, as shown above, the ALTER TABLE ADD PARTITION and ALTER
TABLE SET SUBPARTITION TEMPLATE commands are supported with
best-effort: depending on who is the current FIRST sibling partition,
which is dictated by the present partition bounds, these two
commands may or may not work as the user desired. If a partition
template works now, it may become invalid later.
In my opinion there are two problems:
If we can fix problem 2, problem 1 will be alleviated.
Obviously, this requires the least amount of work. However,
SUBPARTITION TEMPLATE is a nice GPDB feature that allows users to
define and store a uniform subpartition structure in one go, so that
users can re-use it in a future ALTER TABLE ADD PARTITION command.
Having these two commands implemented with a best-effort approach is
really not our best effort.
template
Currently we specify subpartition template like this[2]:
SET SUBPARTITION TEMPLATE (subpartition_spec)
I propose we update/extend the syntax to:
SET SUBPARTITION TEMPLATE [ SUBPARTITION BY partition_type (column1)] (subpartition_spec)
This allows us to also store the PartitionKey info as a new column in
the gp_partition_template
catalog table. With this information
available as part of the template itself, we will no longer need to
retrieve the first partition when ALTER TABLE ADD PARTITION on a table
who has subpartition template.
Since this is catalog change, if we want it, we should do it now.
We could perhaps make the SUBPARTITION BY clause optional for backward
compatibility and still use the first partition's PartitionKey info if
user didn't specify one AND if we have at least one child partition
present, but we will always record the PartitionKey info in the
gp_partition_template
catalog table.
If a subpartition template (with the subpartition's PartitionKey
info) is present, we should allow ALTER TABLE ADD PARTITION even if
the parent table has zero children, because we wouldn't need anything
from the first child.
If subpartition template is not present, we could perhaps keep
current behavior and continue to ask from the first partition if
available.
Questions and/or thoughts?
[1] https://github.com/pivotal/gp-gpdb-historical/commit/01f9a0c799ff4c10452ea51b21d0fb5d329e3697
[2] https://gpdb.docs.pivotal.io/6-12/ref_guide/sql_commands/ALTER_TABLE.html