huansong

@huansong

Joined on Nov 14, 2022

  • Greenplum 7 (refert to GP7 here) has aligned the internal implementation of partitioned tables with the upstream PostgreSQL. Along with that, it has introduced new DDL syntax (also known as the modern syntax) and catalogs. The old GP6 style DDL syntax are still supported (also known as the classic syntax). But internally they are supported via entirely new implementation and have totally different catalogs. As a result, many GP6 system views that are dependent on outdated catalogs or assumptions are not supported anymore. We are planning to provide new alternatives to help partition maintainance but we are not looking to bring the exact same GP6 views back. Therefore, help is needed in migration of user scripts built on the GP6 catalog structure. This blog is going to address that. pg_partitions We start from the most commonly used ones. The view to show information of all leaf partitions in the current database, pg_partitions has columns as listed in the table below. With this table, we also show a one-line description about where you can find the same information in Greenplum 7, or simply N/A if the information is gone. Column What's it for in GP6 Where in GP7 schemaname
     Like  Bookmark
  • Triggers have limited support in GPDB. There has been questions regarding what exactly a GPDB user can do with triggers. This blog tries to clearly describe the current behaviors of triggers as of GPDB 7.0.0. Note that this might be subject to change in later releases. 1. Row triggers Row triggers are fundamentally limited in GPDB due to its MPP nature: since row triggers are fired on each primary segment, the trigger function can only access data on the segment itself. Therefore, there are a lot of limitations to what a row trigger function can do in GDPB. The trigger function cannot contain non-SELECT queries. Basically, the INSERT/UPDATE/DELETE queries, DDLs (like CREATE TABLE) and things like VACUUM, ANALYZE and CLUSTER are all disallowed in the trigger function. There would be error being thrown at trigger-firing time. For example let's define a trigger as below: CREATE TABLE t1 (a INT) DISTRIBUTED BY (a);
     Like  Bookmark
  • The ALTER TABLE commands are commonly used for database operations like adding column, changing the data type of a column, and a lot more. In many cases, such commands require the whole table to be rewritten while holding an exclusive lock on the table. For large tables this can be a very time-consuming process. As a result, it is often a problem in production because all the other operations on the table (including even SELECT) could be blocked for a long time. This problem has been discussed well in our previous blog Altered States: Greenplum Alter Table Command. In GPDB 7, we have made a few optimizations that help avoid table rewrite for many common ALTER TABLE commands, especially for the append-optimized tables. In this blog we will discuss about these changes. ADD COLUMN Thanks to an optimization introduced to PostgreSQL 11, GPDB 7 (which is based on PostgreSQL 12) now does not require table rewrite if you add a column to a heap table, as long as the new column's DEFAULT value is not volatile. Basically, from the perspective of existing rows in the table, the newly added column will be regarded as missing. When we read the existing rows, the missing columns will be filled with the DEFAULT value of the column. So we do not need to rewrite the existing rows at the time of ALTER TABLE. Here's some example in GPDB 7: psql (12.12) Type "help" for help.
     Like  Bookmark
  • Background Since PostgreSQL 11, the ALTER TABLE ... ADD COLUMN command does not require table rewrite anymore (there's also some implication about whether the default value is NULL or not, but let's ignore that for now). The optimization basically just records the default value for the new column in catalog, and use it when the attribute is missing in the tuple. However, currently in Greenplum, the appendonly tables have some limitation that makes it unable to enjoy the upstream feature. But among the AO-row and AO-column tables, the latter has its own optimization which is to write only the new columns in ADD COLUMN, so we are not too concerned with AO-column tables. Here's the situation of what's being written for Heap vs. AO-row vs. AO-column in Greenplum can be shown in this table: Heap AO-Row AO-Column
     Like  Bookmark
  • [color=#7ddb3b] This blog is one of a series of blogs to introduce partitioning in Greenplum 7: What's New: https://hackmd.io/FnJGS2d_RW-aNAm8K6mHeQ Recursion and Inheritance: https://hackmd.io/x6zxrkXkQ_2LILkI-UxM1w [WIP] Subpartition template: https://hackmd.io/IV9BKBcUSXibyVfwwReD0A Greenplum 7 is a huge milestone for partitioned tables. Besides several improvements and fixes, this is the first Greenplum version that will be aligned with partitioned tables from PostgreSQL world. A little background: before PostgreSQL 10, table partitioning could be done in very limited form in PostgreSQL, which is basically just a use case of table inheritance. In PostgreSQL 10 and onwards, users can start use declarative syntax to define their partition paradigm. For example:
     Like  Bookmark
  • [color=#7ddb3b] This blog is one of a series of blogs to introduce partitioning in Greenplum 7: What's New: https://hackmd.io/FnJGS2d_RW-aNAm8K6mHeQ Recursion and Inheritance: https://hackmd.io/x6zxrkXkQ_2LILkI-UxM1w [WIP] Subpartition template: https://hackmd.io/IV9BKBcUSXibyVfwwReD0A The partition hierarchy is often large and complex, and many times need hybrid table properties among different partions. It is important to understand the recursion behavior in order to get the right partition paradigm that one would like it to be. Similar to our previous blog, this blog will mainly focus on the new commands in Greenplum 7 and the changes in legacy Greenplum commands.
     Like  Bookmark