Try   HackMD

1. Semantic design change

1.1 Naming problem

In 7X, partitions are first-class tables. There is no such thing as alias (partition name) in 7X, therefore pg_partitons.partitionname and pg_partitons.parentpartitionname no longer make sense and not available either.

Take the following table partrl as an example:

-- Classic Syntax
-- Multi-level range-list partitioned table
CREATE TABLE partrl (a int, b int, c int)
DISTRIBUTED BY (a)
PARTITION BY range(b)
SUBPARTITION BY list(c)
(
    PARTITION p1 START (10) END (20) EVERY (5)
    (
        SUBPARTITION sp1 VALUES (1, 2)
    ),
    PARTITION p2 START (0) END (10)
    (
        SUBPARTITION sp2 VALUES (3, 4),
        SUBPARTITION sp1 VALUES (1, 2),
        DEFAULT SUBPARTITION others
    )
);

Below is the pg_partitions entry in 6X for leaf partition partrl_1_prt_p2_2_prt_sp1.

test=# select * from pg_partitions where partitiontablename = 'partrl_1_prt_p2_2_prt_sp1';
-[ RECORD 1 ]------------+------------------------------
schemaname               | public
tablename                | partrl
partitionschemaname      | public
partitiontablename       | partrl_1_prt_p2_2_prt_sp1
partitionname            | sp1
parentpartitiontablename | partrl_1_prt_p2
parentpartitionname      | p2
partitiontype            | list
partitionlevel           | 1
partitionrank            | 
partitionposition        | 2
partitionlistvalues      | 1, 2
partitionrangestart      | 
partitionstartinclusive  | 
partitionrangeend        | 
partitionendinclusive    | 
partitioneveryclause     | 
partitionisdefault       | f
partitionboundary        | SUBPARTITION sp1 VALUES(1, 2)
parenttablespace         | pg_default
partitiontablespace      | pg_default

partitionname = 'sp1' and parentpartitionname = 'p2' are not applicable in 7X. In 7X, we'd only have partitionschemaname = 'partrl_1_prt_p2' and parentpartitiontablename = 'partrl_1_prt_p2_2_prt_sp1'.

1.1.1 Why did we need partitionname and parentpartitionname in the first place?

It looks like parentpartitionname has been used to identify subpartitions. In 7X, there are multiple ways to acquire such information:

  • Option 1: use pg_class.relispartition = 't' and pg_class.relkind = 'p' to identify a middle-level table.
  • Option 2: use pg_partition_ancestors().
test=# SELECT * FROM pg_partition_ancestors('partrl_1_prt_p2_2_prt_sp1');
           relid           
---------------------------
 partrl_1_prt_p2_2_prt_sp1
 partrl_1_prt_p2
 partrl
(3 rows)

-- This tells that the given table has more than one partition ancestors,
-- hence it is a "subpartition".
test=# SELECT count(*) - 1 as num_of_ancestors FROM pg_partition_ancestors('partrl_1_prt_p2_2_prt_sp1');
 num_of_ancestors 
------------------
                2
(1 row)

1.2 Partition hierachy: heterogeneous in 7X vs. homogeneous in 6X

This design philosophy difference affects the usage of the pg_partition_columns view.

The entires of the pg_partition_columns view in 6X for the example partrl table looks like below:

test=# select * from pg_partition_columns where tablename = 'partrl';
 schemaname | tablename | columnname | partitionlevel | position_in_partition_key 
------------+-----------+------------+----------------+---------------------------
 public     | partrl    | b          |              0 |                         1
 public     | partrl    | c          |              1 |                         1
(2 rows)

However, in 7X, middle level partitions on the same level may be partitioned w/ differernt strategies AND partition keys. So, in 7X, using a given root table and a level can not uniquely identify the partition strategy and key(s). It seems this view is mostly used to fetch column name of specified partition level.

Example heterogeneous partition hierachy in 7X:

-- Modern Syntax hetrogeneous example
-- Multi-level range-<list|range> partitioned table
CREATE TABLE partrlr (a int, b int, c int) DISTRIBUTED BY (a) PARTITION BY range(b);

-- First child on level 1 is further partitioned by list on column c.
CREATE TABLE partrlr_p1 PARTITION OF partrlr FOR VALUES FROM (10) TO (15) PARTITION BY list(c);
CREATE TABLE partrlr_p1_sp1 PARTITION OF partrlr_p1 FOR VALUES IN (1, 2);

-- Second child on level 1 is further partition by range on column (a).
CREATE TABLE partrlr_p2 PARTITION OF partrlr FOR VALUES FROM (0) TO (10) PARTITION BY range(a);
CREATE TABLE partrlr_p2_sp1 PARTITION OF partrlr_p2 FOR VALUES FROM (3) TO (4);
CREATE TABLE partrlr_p2_sp2 PARTITION OF partrlr_p2 FOR VALUES FROM (1) TO (2);
CREATE TABLE partrlr_p2_others PARTITION OF partrlr_p2 DEFAULT;

In 7X, had we have the pg_partition_columns view available, the entries for partrlr would look like below:

test=# select * from pg_partition_columns where tablename = 'partrlr';
 schemaname | tablename | columnname | partitionlevel | position_in_partition_key 
------------+-----------+------------+----------------+---------------------------
 public     | partrlr    | b          |              0 |                         1
 public     | partrlr    | c          |              1 |                         1
 public     | partrlr    | a          |              1 |                         1
(3 rows)

This view seems unnecessary/misleading in 7X because:

  • column name can be easily fetched from pg_partitioned_table and pg_attribute.
  • w/ the flexibility in 7X, partition columns returned from this view may come from differerent partitions. Not sure if this is what users expect.
  • partition key can be expressions rather than columns

2. Syntatic change

2.1 FOR RANK is not valid syntax in 7X

Instead use FOR VALUE in ALTER TABLE ALTER PARTITION clause to refer to a child partition. Because of the removal of the FOR RANK syntax, pg_partitions.partitionrank is not required by many user queries, hence neither calculated nor stored in the catalog.

For range partition maintenance purposes, such as finding the highest or lowest partition boundries, we may still want to sort partitions by their partition boundaries, for those cases, we can evaluate the usage and create UDFs based on 7X catalog.

2.2 pg_dump and pg_restore use modern syntax

Keeping pg_partitions.partitionboundary which stores partition boundary specification in classic syntax may be unnecessary and even error prone.

3. Catalog change

3.1 pg_partitions.partitionposition is not stored in the catalog

For range partitions, partitionposition is same as partitionrank, they are not stored in the catalog. We can acquire them by writing custom queries or UDFs to sort the range partition boundaries.

List partition has problem with it.

For list partitions, partitionposition monotonically increases as new list partitions being created. This information we don't store in 7X's catalog, and we can't reconstruct either.

For hash partitions, which is new in Greenplum, we don't have a definition for partitionposition with hash partitions.

3.2 Some fields would be dummy

  • pg_partitions.partitionstartinclusive: always true
  • pg_partitions.partitionendinclusive: always false
  • pg_partitions.partitioneveryclause: always NULL

3.3 pg_class.relpartbound stores partition boundaries in 7X

Example in 7X:

test=# select relname, pg_get_expr(relpartbound, c.oid) as relpartbound from pg_class c join pg_partition_tree('partrl') p on c.oid = p.relid;
           relname            |         relpartbound         
------------------------------+------------------------------
 partrl                       | 
 partrl_1_prt_p1_1            | FOR VALUES FROM (10) TO (15)
 partrl_1_prt_p1_2            | FOR VALUES FROM (15) TO (20)
 partrl_1_prt_p2              | FOR VALUES FROM (0) TO (10)
 partrl_1_prt_p1_1_2_prt_sp1  | FOR VALUES IN (1, 2)
 partrl_1_prt_p1_2_2_prt_sp1  | FOR VALUES IN (1, 2)
 partrl_1_prt_p2_2_prt_others | DEFAULT
 partrl_1_prt_p2_2_prt_sp2    | FOR VALUES IN (3, 4)
 partrl_1_prt_p2_2_prt_sp1    | FOR VALUES IN (1, 2)
(9 rows)

from which partitionrangestart, partitionrangeend, partitionlistvalues can be parsed, if needed.

3.4 pg_partitions.partitionlevel is one off

Taking partrl as an example:

In 7X, root partition is at level 0 and leaf partitions are at level 2:

test=# select * from pg_partition_tree('partrl');
            relid             |    parentrelid    | isleaf | level 
------------------------------+-------------------+--------+-------
 partrl                       |                   | f      |     0
 partrl_1_prt_p1_1            | partrl            | f      |     1
 partrl_1_prt_p1_2            | partrl            | f      |     1
 partrl_1_prt_p2              | partrl            | f      |     1
 partrl_1_prt_p1_1_2_prt_sp1  | partrl_1_prt_p1_1 | t      |     2
 partrl_1_prt_p1_2_2_prt_sp1  | partrl_1_prt_p1_2 | t      |     2
 partrl_1_prt_p2_2_prt_others | partrl_1_prt_p2   | t      |     2
 partrl_1_prt_p2_2_prt_sp2    | partrl_1_prt_p2   | t      |     2
 partrl_1_prt_p2_2_prt_sp1    | partrl_1_prt_p2   | t      |     2
(9 rows)

In 6X, pg_partitions doesn't have an entry for root table partrl, and partitionlevel starts from 0 for root table's direct children and 1 for the leaf partitions:

test=# select partitiontablename, parentpartitionname, tablename, partitionlevel from pg_partitions where partitiontablename in ('partrl_1_prt_p1_1', 'partrl_1_prt_p1_1_2_prt_sp1');
     partitiontablename      | parentpartitionname | tablename | partitionlevel 
-----------------------------+---------------------+-----------+----------------
 partrl_1_prt_p1_1           |                     | partrl    |              0
 partrl_1_prt_p1_1_2_prt_sp1 | p1_1                | partrl    |              1
(2 rows)

So pg_partitions.partitionlevel and level returned from pg_partition_tree() is one off, which is not a problem, but can be confusing.

Summary

pg_partition_columns

schema of this view is problematic for

  • heterogeneous partition hierachy, AND
  • expression as partition key(s)

pg_partitions

Unavailable fields:

  • partitionname
  • parentpartitionname
  • partitionposition (List and Hash partitions)

Dummy fields:

  • partitionstartinclusive
  • partitionendinclusive
  • partitioneveryclause

Potential irrelavent field:

  • partitionboundary

Potential misleading field:

  • partitionlevel

[WIP] What do we really need? (What is currently missing in 7X)

  1. Given a child table and its parent table, extract following fine-grained range partition information from pg_class.relpartbound for partition maintenance:
  • partitionrank
  • partitionrangestart
  • partitionrangeend
  • partitionisdefault
  1. Looke more into what is needed for list partition maintenance, pg_class.relpartbound has them:
  • partitionlistvalues
  • partitionisdefault
  1. Given a root table, maybe we can have a function which tells weather or not the partition hierachy is homogeneous.
  2. Use pg_dump to get DDLs instead of pg_get_partition_def()
  3. Get other trivial information, such as schema, tablespace, attributenames elsewhere.

[TODO] Partition maintenance alternatives

Need to look more into pg_partman.

Appendix

pg_partitions

test=# \d pg_partitions
         View "pg_catalog.pg_partitions"
          Column          |   Type   | Modifiers 
--------------------------+----------+-----------
 schemaname               | name     | 
 tablename                | name     | 
 partitionschemaname      | name     | 
 partitiontablename       | name     | 
 partitionname            | name     | 
 parentpartitiontablename | name     | 
 parentpartitionname      | name     | 
 partitiontype            | text     | 
 partitionlevel           | smallint | 
 partitionrank            | bigint   | 
 partitionposition        | smallint | 
 partitionlistvalues      | text     | 
 partitionrangestart      | text     | 
 partitionstartinclusive  | boolean  | 
 partitionrangeend        | text     | 
 partitionendinclusive    | boolean  | 
 partitioneveryclause     | text     | 
 partitionisdefault       | boolean  | 
 partitionboundary        | text     | 
 parenttablespace         | name     | 
 partitiontablespace      | name     | 

Easy to get, naming convention may not be consistant w/ new catalog:
schemaname: root table's schema name
tablename: root table's table name
partitionschemaname: this partition's schema name
partitionname: this partitions table name
parentpartitiontablename: parent table's schema name is parent is not root
parentpartitionname: parent table's table name if parent is not root
partitiontype: parent's partition by strategy
partitionlevel: parent's partition level
partitionposition:
partitionlistvalues: parse text from list partition's relpartbound
partitionrangestart: parse text from range partition's relpartbound
partrangestartinclusive: if not null always true
partitionrangeend: parse text from range partition's relpartbound
partitionrangeinclusive: if not null always false
partitionisdefault: parse from relpartbound
parenttablespce:
partitiontablespace:

Missing in new catalog tables but can write UDF:
partitionrank: for range partition only, starting from 1. index of this partition after ordering the range partition boundaries
partitionposition: same as partitionrank for range partitions, monotonically increasing number for list partitions

Need to understand usage:
partitionboundary: convert pg_get_expr(relpartbound) to classic syntax

pg_partition_columns

test=# \d pg_partition_columns
      View "pg_catalog.pg_partition_columns"
          Column           |   Type   | Modifiers 
---------------------------+----------+-----------
 schemaname                | name     | 
 tablename                 | name     | 
 columnname                | name     | 
 partitionlevel            | smallint | 
 position_in_partition_key | integer  | 

schemaname: root table's schema name
tablename: root table's table name
columnname: partition key
partitionlevel: start from 0
position_in_partition_key: always 0 for range partition and can be greater than 0 for list partition.

Test tables

-- Classic Syntax
-- Multi-level range-list partitioned table
CREATE TABLE partrl (a int, b int, c int)
DISTRIBUTED BY (a)
PARTITION BY range(b)
SUBPARTITION BY list(c)
(
    PARTITION p1 START (10) END (20) EVERY (5)
    (
        SUBPARTITION sp1 VALUES (1, 2)
    ),
    PARTITION p2 START (0) END (10)
    (
        SUBPARTITION sp2 VALUES (3, 4),
        SUBPARTITION sp1 VALUES (1, 2),
        DEFAULT SUBPARTITION others
    )
);

-- Modern Syntax hetrogeneous example
-- Multi-level range-<list|range> partitioned table
CREATE TABLE partrlr (a int, b int, c int) DISTRIBUTED BY (a) PARTITION BY range(b);

-- First child on level 1 is further partitioned by list on column c.
CREATE TABLE partrlr_p1 PARTITION OF partrlr FOR VALUES FROM (10) TO (15) PARTITION BY list(c);
CREATE TABLE partrlr_p1_sp1 PARTITION OF partrlr_p1 FOR VALUES IN (1, 2);

-- Second child on level 1 is further partition by range on column (a).
CREATE TABLE partrlr_p2 PARTITION OF partrlr FOR VALUES FROM (0) TO (10) PARTITION BY range(a);
CREATE TABLE partrlr_p2_sp1 PARTITION OF partrlr_p2 FOR VALUES FROM (3) TO (4);
CREATE TABLE partrlr_p2_sp2 PARTITION OF partrlr_p2 FOR VALUES FROM (1) TO (2);
CREATE TABLE partrlr_p2_others PARTITION OF partrlr_p2 DEFAULT;

-- Classic Syntax
-- Multi-level range-range partitioned table
CREATE TABLE partrr (a int, b int, c int)
DISTRIBUTED BY (a)
PARTITION BY range(b)
SUBPARTITION BY range(c)
(
    PARTITION p1 START (0) END (10)
    (
        DEFAULT SUBPARTITION other_c
    ),
    PARTITION p2 START (10) END (20)
    (
        SUBPARTITION sp2 START (300) END (600),
        SUBPARTITION sp1 START (100) END (300),
        DEFAULT SUBPARTITION other_c
    )
);

-- Sales
CREATE TABLE sales (trans_id int, date date, amount
decimal(9,2), region text)
DISTRIBUTED BY (trans_id)
PARTITION BY RANGE (date)
SUBPARTITION BY LIST (region)
SUBPARTITION TEMPLATE
      ( SUBPARTITION usa VALUES ('usa'),
      SUBPARTITION asia VALUES ('asia'),
      SUBPARTITION europe VALUES ('europe'),
      DEFAULT SUBPARTITION other_regions)
  (START (date '2023-01-01') INCLUSIVE
   END (date '2023-05-01') EXCLUSIVE,
  DEFAULT PARTITION other_date);