GDPB partition: catalogs/views

# Greenplum 7 partition: migration guide for system catalogs/views Greenplum 7 (refert to GP7 here) has aligned the internal implementation of partitioned tables with the upstream PostgreSQL. Along with that, it has introduced new DDL syntax (also known as the [ *modern* syntax](https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-ddl-about-part-changes.html)) and catalogs. The old GP6 style DDL syntax are still supported (also known as the [*classic* syntax](https://docs.vmware.com/en/VMware-Greenplum/7/greenplum-database/admin_guide-ddl-ddl-partition.html#:~:text=the%20best%20strategy.-,Choosing%20the%20Partitioning%20Syntax,-Greenplum%20Database%207)). But internally they are supported via entirely new implementation and have totally different catalogs. As a result, many GP6 system views that are dependent on outdated catalogs or assumptions are not supported anymore. We are planning to provide new alternatives to help partition maintainance [but we are not looking to bring the exact same GP6 views back](https://hackmd.io/@l-wang/S1-DQYdq2). Therefore, help is needed in migration of user scripts built on the GP6 catalog structure. This blog is going to address that. ## `pg_partitions` We start from the most commonly used ones. The view to show information of all leaf partitions in the current database, [`pg_partitions`](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/ref_guide-system_catalogs-pg_partitions.html) has columns as listed in the table below. With this table, we also show a one-line description about where you can find the same information in Greenplum 7, or simply N/A if the information is gone. | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |schemaname|The name of the schema the partition root table is in| Get root OID via `pg_partition_root()` and then query pg_namespace | |tablename|The name of the partition root table| Get root OID via `pg_partition_root()` and then query pg_class | |partitionschemaname|The namespace of the leaf partition | pg_namespace | |partitiontablename|The table name of the leaf partition (this is the table name to use if accessing the partition directly)| pg_class | |partitionname|The partition name of the leaf partition (this is the name to use if referring to the partition in an ALTER TABLE command)| N/A | |parentpartitiontablename|The table name of the parent table one level up from this partition| Get parent OID via pg_inherits and then query pg_class | |parentpartitionname|The partition name of the parent table one level up from this partition| N/A | |partitiontype|The type of partition (range or list)| Get parent OID via pg_inherits and then query `pg_partitioned_table` | |partitionlevel|The level of this partition in the hierarchy| Get level from `pg_partition_tree()` using the parent OID | |partitionrank|For range partitions, the rank of the partition compared to other partitions of the same level| N/A | |partitionposition|The rule order position of this partition| N/A | |partitionlistvalues|For list partitions, the list value(s) associated with this partition| Get partition bound via `pg_get_expr()` and then filter text | |partitionrangestart|For range partitions, the start value of this partition| Get partition bound via `pg_get_expr()` and then filter text | |partitionstartinclusive|T if the start value is included in this partition| always inclusive | |partitionrangeend|For range partitions, the end value of this partition| Get partition bound via `pg_get_expr()` and then filter text | |partitionendinclusive|T if the end value is included in this partition| always exclusive | |partitioneveryclause|The EVERY clause (interval) of this partition| N/A | |partitionisdefault|T if this is a default partition, otherwise F| Get partition bound via `pg_get_expr()` and check if it is 'DEFAULT' | |partitionboundary|The entire partition specification for this partition| `pg_get_expr()` but it is in modern syntax | |parenttablespace|The tablespace of the parent table one level up from this partition| Get parent OID via pg_inherits and then query pg_tablespace | |partitiontablespace|The tablespace of this partition| pg_tablespace | Some of the columns are quite trivial to retrieve, but some of them are not. Next we will talk about the intricacies in details and provide a few concrete examples on how to retrieve the same (or at least similar) information in GP7. 1. `partitionname` / `parentpartitionname` These columns give the "partition name" in constrast to the actual table name. In our [previous blog](https://greenplum.org/partition-in-greenplum-7-whats-new/#:~:text=Partition%20name%20vs.%20Table%20name), we have discussed their difference. Because in GP7, we do not store the partition name in catalog anymore (used to be in `pg_partition_rule`), we cannot retrieve that information now. We also recommend just using the table names directly when referring to the partitions. However, if you need to get what is equivalent to partition name, you can achieve that via some text massaging: ```sql SELECT c.relname AS table_name, split_part(substr(c.relname, position(inh.inhparent::regclass::text in c.relname) + length(inh.inhparent::regclass::text)), '_', 4) AS partition_name FROM pg_class c LEFT JOIN pg_inherits inh ON inh.inhrelid = c.oid WHERE c.relname LIKE concat(inh.inhparent::regclass::text, '%') AND c.relname LIKE '<your_partition_name>'; ``` This is based on the assumption that table name generated implicitly from partition name is in the form of `<parent_tablename>_prt_<level>_<partition_name>`. Note that, however, if the table name is not internally generated but is specified by user, then the trick does not work. That will be the case not only for modern syntax, where you always need to specify the table name when adding partition, but also true even with classic syntax where you can explicitly specify the table name in a `WITH` clause for the partition. So again, it would be much better for users to just refer to their table names in GP7. 2. `partitionposition`/`partitionrank` The `partitionposition`/`partitionrank` columns in the `pg_partitions` view are based on the "order" number for each partition in the `pg_partition_rule` catalog. That is gone in GP7. However, if you just want to get the sorted order for range partitions, it can be achieved simply by ordering the range value (either START or END is fine). For example, say we have the below query that gives the highest rank partition: ```sql! SELECT CASE WHEN partitionstartinclusive THEN partitionrangestart ELSE partitionrangeend END AS old_part_value FROM pg_catalog.pg_partitions p WHERE p.schemaname = '" + cfg.schema + "' AND p.tablename = '" + cfg.table + "' AND p.partitiontype = 'range' AND p.partitionlevel = " + str(cfg.partition_level) + " AND " + chk + " ORDER BY partitionrank LIMIT 1; ``` A similar query in GP7 would be: ```sql! SELECT rank() over ( partition by pc.oid order by CAST( (regexp_matches(pg_get_expr(c.relpartbound, c.oid), 'FOR VALUES FROM \(([0-9]+)\) TO \(([0-9]+)\)'))[1] AS INTEGER ) ) AS rank, (regexp_matches(pg_get_expr(c.relpartbound, c.oid), 'FOR VALUES FROM \(([0-9]+)\) TO \(([0-9]+)\)'))[1] AS old_part_value FROM pg_class c LEFT JOIN pg_inherits inh ON inh.inhrelid = c.oid LEFT JOIN pg_partitioned_table pt ON inh.inhparent = pt.partrelid LEFT JOIN pg_class pc ON pc.oid = pt.partrelid WHERE c.relispartition = 't' AND pc.relname LIKE ('<partition_root_schema>.<partition_root_name>') ORDER BY rank LIMIT 1; ``` Assuming the partitioned table is a range one, the above query calculates the ranks for each partition under it, and prints the range START value (which is always inclusive) with the highest rank. However, for list partitions, because the order is based on **the times when the partition is added**, and GP7 does not store that piece of information anymore, it is not possible to get the order number in that case. The query that uses it for list partitions would have to be changed in a way that it is not dependent on that information. 3. `partitioneveryclause` Similar to previous cases, GP7 does not keep the use of `EVERY` clause in catalog anymore. So we won't know if a partition is created out of `EVERY` clause or not. But according to our understanding, this column has been rarely used in partition maintainance scripts. Its main use is to reconstruct partition definition clause using the classic-syntax for showing or pg_dump purposes. That would be something we would like to address in another blog (basically how to get equivalent of the `pg_get_partition_def` UDF). 4. `partitionstartinclusive` / `partitionendinclusive` The classic syntax `INCLUSIVE` and `EXCLUSIVE` are still supported but through [adjustment of START and END values](https://greenplum.org/partition-in-greenplum-7-whats-new/#:~:text=START%7CEND%20vs%20FROM%7CTO). As a result, we do not record whether a range START or END is inclusive or not, because the START is always inclusive and the END is always exclusive. So these two columns are simply redundant. 5. `partitionboundary` The partition boundary definition can be retrieved easily in GP7 via the `pg_get_expr()` function: ```sql SELECT pg_get_expr(relpartbound, oid) FROM pg_class WHERE relispartition = 't'; ``` But note that it will be in modern syntax. It can be used in similar way in modern syntax DDL through `ATTACH PARTITION` or `CREATE TABLE ... PARTITION OF` commands. ### An "approximate" `pg_partitions` Here we provide an "approximate" `pg_partitions` view that, with all the limitation in mind, can prints what is possible to show. Note that this is only for understanding and lookup purposes, not for practical use. Again we are working on providing better forms for some of the useful information ```sql! SELECT (SELECT relnamespace::regnamespace FROM pg_class WHERE oid = pg_partition_root(c.oid)) AS schemaname, (SELECT pg_partition_root(c.oid)::regclass) AS tablename, n.nspname AS partitionschemaname, c.relname AS partitiontablename, -- assuming the table name is implicitly generated like <parent>_prt_<id>_<partition> split_part( substr(c.relname, position(pc.relname in c.relname) + length(pc.relname)), '_', 4 ) AS partitionname, pc.relname AS parentpartitiontablename, -- same assumption as above split_part( substr(pc.relname, position(ppc.relname in pc.relname) + length(ppc.relname)), '_', 4 ) AS parentpartitionname, CASE WHEN pt.partstrat = 'r' THEN 'range' ELSE 'list' END AS partitiontype, (SELECT level FROM pg_partition_tree(pg_partition_root(c.oid)) WHERE relid = c.oid) AS partitionlevel, -- can be calculated like a previous example NULL AS partitionrank, -- cannot be trusted because no real order for list partitions NULL AS partitionposition, (regexp_matches(pg_get_expr(c.relpartbound, c.oid), 'FOR VALUES IN \((.*?)\)'))[1] AS partitionlistvalues, -- assuming range values are normal integers instead of expressions which GP7 supports (regexp_matches(pg_get_expr(c.relpartbound, c.oid), 'FOR VALUES FROM \(([0-9]+)\) TO \(([0-9]+)\)'))[1] AS partitionrangestart, 't' AS partitionstartinclusive, (regexp_matches(pg_get_expr(c.relpartbound, c.oid), 'FOR VALUES FROM \(([0-9]+)\) TO \(([0-9]+)\)'))[2] AS partitionrangeend, 'f' AS partitionendinclusive, -- information cannot be retrieved NULL AS partitioneveryclause, CASE WHEN pg_get_expr(c.relpartbound, c.oid) = 'DEFAULT' THEN 't' ELSE 'f' END AS partitionisdefault, (regexp_matches(pg_get_expr(c.relpartbound, c.oid), '.+'))[1] AS partitionboundary, CASE WHEN pc.reltablespace = 0 THEN 'pg_default' ELSE (SELECT spcname FROM pg_tablespace WHERE oid = pc.reltablespace) END AS parenttablespace, CASE WHEN c.reltablespace = 0 THEN 'pg_default' ELSE (SELECT spcname FROM pg_tablespace WHERE oid = c.reltablespace) END AS partitiontablespace FROM pg_class c LEFT JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_inherits inh ON inh.inhrelid = c.oid LEFT JOIN pg_partitioned_table pt ON inh.inhparent = pt.partrelid LEFT JOIN pg_class pc ON pc.oid = pt.partrelid LEFT JOIN pg_inherits pinh ON pinh.inhrelid = pc.oid LEFT JOIN pg_partitioned_table ppt ON pinh.inhparent = ppt.partrelid LEFT JOIN pg_class ppc ON ppc.oid = ppt.partrelid WHERE c.relispartition = 't'; ``` ## `pg_partition_columns` Another frequently used view, the `pg_partition_columns` is a simpler one which shows information about all the partition keys (columns) in the current database: | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |schemaname|The name of the schema the partitioned table is in| Get root OID via `pg_partition_root()` and then query pg_namespace | |tablename|The table name of the partitioned table| pg_class | |columnname|The name of the partition key column| pg_attribute | |partitionlevel|The level of this subpartition in the hierarchy| Get level from `pg_partition_tree()` using the parent OID | |position_in_partition_key|For list partitions you can have a composite (multi-column) partition key. This shows the position of the column in a composite key.| always 1 | Similarly, there is a column that is not relevant in GP7 anymore: `position_in_partition_key` because we now support multi-column partition key [via multi-column type](https://greenplum.org/partition-in-greenplum-7-whats-new/#:~:text=Multi%2Dcolumn%20PARTITION%20BY). So the relative position of the column in the partition key will always be one. And there is a more important reason why it does not work in GP7: the view assumes a homogenous partition structure which is no longer the only truth in GP7 as it supports a heterogenous structure as well. This has been discussed in details in [this blog](https://hackmd.io/GD32YdX3RXyNEJ-Kyzo_kw?view#12-Partition-hierachy-heterogeneous-in-7X-vs-homogeneous-in-6X). But we can still provide an approximate view with these two assumptions: (1) we do not have multi-column type partitioned table, and (2) all partition structures are a homogeneous. ```sql SELECT c.relnamespace::regnamespace AS schemaname, c.relname AS tablename, att.attname AS columnname, ( SELECT level FROM pg_partition_tree(pg_partition_root(c.oid)) WHERE relid = c.oid ) AS partitionlevel, 1 AS position_in_partition_key FROM pg_partitioned_table pt LEFT JOIN pg_class c ON c.oid = pt.partrelid JOIN pg_attribute att ON c.oid = att.attrelid AND att.attnum = pt.partattrs[0]; ``` ## `pg_partition_rule` The catalog table `pg_partition_rule` keeps information about the partition rules of all leaf partitions. In most of the time `pg_partitions` is used instead of the `pg_partition_rule` catalog. The information in this catalog (besides those N/A) is now kept mostly in `pg_class`, `pg_partitioned_table` etc. And, as we will see below, most information that still makes sense can be retrieved in same way as `pg_partitions`, so we do not write an "approximate" view for this catalog table. Please refer to the `pg_partitions` section if you need to look up for necessary info in this table. | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |paroid |Row identifier of the partitioning level (from pg_partition) to which this partition belongs. In the case of a branch partition, the corresponding table (identified by pg_partition_rule) is an empty container table. In case of a leaf partition, the table contains the rows for that partition containment rule.| N/A (gone object) | |parchildrelid| The table identifier of the partition. | pg_class | |parparentrule| The row identifier of the rule associated with the parent table of this partition. |N/A (gone object)| |parname| The given name of this partition. | pg_class | |parisdefault| Whether or not this partition is a default partition.| See description for `partitionisdefault` column in `pg_partitions` | |parruleord| For range partitioned tables, the rank of this partition on this level of the partition hierarchy. | N/A (see explanation for `partitionposition` / `partitionrank` columns in `pg_partitions`)| |parrangestartincl| For range partitioned tables, whether or not the starting value is inclusive. |always inclusive| |parrangeendincl| For range partitioned tables, whether or not the ending value is inclusive. |always exclusive| |parrangestart| For range partitioned tables, the starting value of the range. | See description for `partitionrangestart` column in `pg_partitions` | |parrangeend| For range partitioned tables, the ending value of the range. | See description for `partitionrangeend` column in `pg_partitions` | |parrangeevery| For range partitioned tables, the interval value of the EVERY clause. |N/A (see description for `partitioneveryclause` column in `pg_partitions`)| |parlistvalues| For list partitioned tables, the list of values assigned to this partition. | See description for `partitionlistvalues` column in `pg_partitions`| |parreloptions| An array describing the storage characteristics of the particular partition. | pg_class.reloptions | ## `pg_partition` The catalog table `pg_partition` (not to be confused with the `pg_partitions` view) keeps information of the partitioned root table which are mostly in `pg_partitioned_table` in GP7. It can be used very similarly: | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |parrelid |The object identifier of the table.| `pg_partitioned_table.partrelid` | |parkind |The partition type - R for range or L for list.| `pg_partitioned_table.partstrat` | |parlevel |The partition level of this row: 0 for the top-level parent table, 1 for the first level under the parent table, 2 for the second level, and so on.| Cross-check `pg_partitioned_table` and `pg_partition_root`/`pg_partition_tree`, similar to how we can get `pg_partitions.partitionlevel`| |paristemplate |Whether or not this row represents a subpartition template definition (true) or an actual partitioning level (false).| Cross-check `pg_partitioned_table.partrelid` with `gp_partition_template` | |parnatts |The number of attributes that define this level.| `pg_partitioned_table.partnatts` | |paratts |An array of the attribute numbers (as in pg_attribute.attnum) of the attributes that participate in defining this level.| `pg_partitioned_table.partattrs` | |parclass |The operator class identifier(s) of the partition columns.| `pg_partitioned_table.partclass` | ## `pg_partition_template` The view `pg_partition_template` is used to show subpartitions that were created using a subpartition template. As we can see, all of the information here is same as what you can get from the `pg_partitions` view. The only difference is just that `pg_partition_template` only shows those created by subpartition template, which can be cross-checked with `pg_partition.paristemplate` of the root table. Note that, however, this is based on the working assumption in GP6 that if the root has a subpartition template, then all of its subpartitions are created by subpartition template. In GP7 this is no longer true with the `ATTACH PARTITION`, so user needs to be mindful about that when rewriting their script with information provided here. | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |schemaname |The name of the schema the partitioned table is in. | See `pg_partitions.schemaname` | |tablename |The table name of the top-level parent table. | See `pg_partitions.tablename`| |partitionname |The name of the subpartition (this is the name to use if referring to the partition in an ALTER TABLE command). NULL if the partition was not given a name at create time or generated by an EVERY clause. | See `pg_partitions.partitionname` | |partitiontype |The type of subpartition (range or list). | See `pg_partitions.partitiontype`| |partitionlevel |The level of this subpartition in the hierarchy. | See `pg_partitions.partitionlevel` | |partitionrank |For range partitions, the rank of the partition compared to other partitions of the same level. | See `pg_partitions.partitionrank`| |partitionposition |The rule order position of this subpartition. | See `pg_partitions.partitionposition` | |partitionlistvalues |For list partitions, the list value(s) associated with this subpartition. | See `pg_partitions.partitionlistvalues`| |partitionrangestart |For range partitions, the start value of this subpartition. | See `pg_partitions.partitionrangestart`| |partitionstartinclusive |T if the start value is included in this subpartition. F if it is excluded. | See `pg_partitions.partitionstartinclusive`| |partitionrangeend |For range partitions, the end value of this subpartition. | See `pg_partitions.partitionrangeend`| |partitionendinclusive |T if the end value is included in this subpartition. F if it is excluded. | See `pg_partitions.partitionendinclusive`| |partitioneveryclause |The EVERY clause (interval) of this subpartition. |See `pg_partitions.partitioneveryclause` | |partitionisdefault |T if this is a default subpartition, otherwise F. | See `pg_partitions.partitionisdefault`| |partitionboundary |The entire partition specification for this subpartition. |See `pg_partitions.partitionboundary` | ## `pg_partition_encoding` The catalog table keeps available column storage options for a subpartition template. This information is now kept in `gp_partition_template` but needs to retrieve as a one-line definition of the template via `pg_get_expr`, e.g.: ```sql postgres=# SELECT level, pg_get_expr(template, relid) from gp_partition_template where relid::regclass::text = 'ccddl'; level | pg_get_expr -------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 1 | SUBPARTITION TEMPLATE(SUBPARTITION sp1 VALUES (1, 2, 3, 4, 5), COLUMN i ENCODING (compresstype=zlib), COLUMN j ENCODING (compresstype=rle_type), COLUMN k ENCODING (compresstype=zlib), COLUMN l ENCODING (compresstype=zlib)) ``` So some similar text filtering is needed if one wants to get the per-column encoding information like the original `pg_partition_encoding` table: | Column | What's it for in GP6 | Where in GP7 | | -------- | -------- | -------- | |parencoid | Partitioned table OID which this subpartition template is under. | `gp_partition_template` | |parencattnum | The attribute number of the column which the encoding option is for. | `pg_get_expr(template, <root partition OID>)` | |parencattoptions | The storage option of this column. | `pg_get_expr(template, <root partition OID>)` | ## `pg_stat_partition_operations` The view `pg_stat_partition_operations` (see[ the schema in documentation](https://docs.vmware.com/en/VMware-Greenplum/6/greenplum-database/ref_guide-system_catalogs-pg_stat_partition_operations.html)) basically shows the same information as `pg_stat_operations` but only for the partitioned table (including partitions). It also includes a few columns for the partitions themselves. We can retrieve the information with a query such as: ```sql! SELECT so.*, (SELECT level FROM pg_partition_tree(pg_partition_root(c.oid)) WHERE relid = c.oid) AS partitionlevel, pc.relname AS parentpartitiontablename, pc.relnamespace::regnamespace AS parentschemaname, pc.oid AS parent_relid FROM pg_stat_operations so LEFT JOIN pg_class c ON so.objid = c.oid LEFT JOIN pg_namespace n ON n.oid = c.relnamespace LEFT JOIN pg_inherits inh ON inh.inhrelid = c.oid LEFT JOIN pg_partitioned_table pt ON inh.inhparent = pt.partrelid LEFT JOIN pg_class pc ON pc.oid = pt.partrelid WHERE c.relispartition = 't' OR c.oid IN (SELECT partrelid FROM pg_partitioned_table); ```