Triggers in GPDB 7

# Triggers in GPDB7 Triggers have limited support in GPDB. There has been questions regarding what exactly a GPDB user can do with triggers. This blog tries to clearly describe the **current** behaviors of triggers as of GPDB 7.0.0. Note that this might be subject to change in later releases. ## 1. Row triggers Row triggers are fundamentally limited in GPDB due to its MPP nature: since row triggers are fired on each primary segment, the trigger function can only access data on the segment itself. Therefore, there are a lot of limitations to what a row trigger function can do in GDPB. ### The trigger function cannot contain non-SELECT queries. Basically, the INSERT/UPDATE/DELETE queries, DDLs (like CREATE TABLE) and things like VACUUM, ANALYZE and CLUSTER are all disallowed in the trigger function. There would be error being thrown at trigger-firing time. For example let's define a trigger as below: ```sql! CREATE TABLE t1 (a INT) DISTRIBUTED BY (a); CREATE TABLE trigtab (a INT, b INT) DISTRIBUTED BY (a); CREATE FUNCTION insert_tri_func() RETURNS TRIGGER LANGUAGE plpgsql AS $$ BEGIN INSERT INTO t1 VALUES(1); RETURN NULL; END; $$; CREATE TRIGGER insert_tri AFTER INSERT ON trigtab FOR EACH ROW EXECUTE FUNCTION insert_tri_func(); ``` Inserting to the table, we will see: ```sql postgres=# insert into trigtab values(1); ERROR: function cannot execute on a QE slice because it issues a non-SELECT statement (seg1 127.0.1.1:7003 pid=321) CONTEXT: SQL statement "INSERT INTO t1 VALUES(1)" PL/pgSQL function insert_tri_func() line 3 at SQL statement ``` As shown by the error message, the ERROR is a generic one which is not tied to triggers. Basically, a function cannot modify the database if it is executed on one segment ("QE" being Query Executor, see [blog](https://greenplum.org/introduction-to-greenplum-architecture/) if you are not familiar with it). In fact, most of other limitations in triggers have similar nature. ### Trigger function can only SELECT from catalogs or replicated tables Similar to the previous point, even we can do SELECT in a trigger function, we cannot SELECT from a table that has data on other segments. In other words, the current segment has to contain the *full table* - which means that the table can only be a catalog or a replicated table. For example, you may have something like: ```sql! CREATE TABLE trigtab (a INT, b INT) DISTRIBUTED BY (a); CREATE FUNCTION insert_tri_func() RETURNS TRIGGER LANGUAGE plpgsql AS $$ DECLARE rowcount int; contentid int; BEGIN SELECT reltuples INTO rowcount FROM pg_class WHERE relname = 'trigtab'; contentid := current_setting('gp_contentid')::int; RAISE NOTICE 'Inserted a row into table trigtab. The row count on segment % at last ANALYZE is %', contentid, rowcount; RETURN NULL; END; $$; CREATE TRIGGER insert_tri AFTER INSERT ON trigtab FOR EACH ROW EXECUTE FUNCTION insert_tri_func(); ``` The trigger will be fired as expected: ```sql! postgres=# insert into trigtab values(1); NOTICE: Inserted a row into table trigtab. The row count on segment 1 at last ANALYZE is 0 (seg1 127.0.1.1:7003 pid=13137) INSERT 0 1 postgres=# analyze trigtab; ANALYZE postgres=# insert into trigtab values(1); NOTICE: Inserted a row into table trigtab. The row count on segment 1 at last ANALYZE is 1 (seg1 127.0.1.1:7003 pid=13137) INSERT 0 1 ``` But if we SELECT from a regular table in the `insert_tri_func()`, we would see errors like: ```sql ERROR: function cannot execute on a QE slice because it accesses relation "public.t100" (seg1 127.0.1.1:7003 pid=22853) ``` The error is similar to the previous case: a QE cannot access data on another QE, so such a trigger function is invalid. ### UPDATE trigger cannot be used when updating the distribution key Since updating the distribution key potentially would need to move rows to a different segment, like when we will be deleting a row on the current segment and inserting row on other segment. So similarly to the DML limitation mentioned before, this is not allowed. For example (for simplicity, let's re-use the same trigger function in the INSERT example): ```sql! CREATE TRIGGER update_tri AFTER UPDATE ON trigtab FOR EACH ROW EXECUTE FUNCTION insert_tri_func(); ``` Updating distribution key is disallowed, but updating a normal columns is fine: ```sql! postgres=# UPDATE trigtab SET a = 1; ERROR: UPDATE on distributed key column not allowed on relation with update triggers postgres=# UPDATE trigtab SET b = 1; UPDATE 0 ``` ### Trigger cannot be fired on AO/CO tables At the moment, GPDB does not allow **any** trigger on AO/CO tables: ```sql! CREATE TABLE ao (a INT) USING ao_row; CREATE TABLE co (a INT) USING ao_column; -- reuse the function from the previous examples CREATE TRIGGER trig_ao AFTER INSERT ON ao FOR EACH ROW EXECUTE FUNCTION insert_tri_func(); CREATE TRIGGER trig_co AFTER INSERT ON co FOR EACH ROW EXECUTE FUNCTION insert_tri_func(); postgres=# INSERT INTO ao VALUES(1); ERROR: feature not supported on appendoptimized relations (seg1 127.0.1.1:7003 pid=13137) postgres=# INSERT INTO co VALUES(1); ERROR: Trigger is not supported on AOCS yet (trigger.c:2714) (seg1 127.0.1.1:7003 pid=13137) (trigger.c:2714) ``` ### Other general limitations to triggers Besides those GPDB-specific limitations, triggers also have some upstream limitations which also exist in GPDB. For example, you cannot define a trigger function with declared arguments, but have to access the arguments through special trigger function variables like [`TG_ARGV`](https://www.postgresql.org/docs/current/plpgsql-trigger.html). Also, the arguments have to be literal strings (or things that can be converted to strings). For other such limitations, refer to the [upstream documentation](https://www.postgresql.org/docs/current/sql-createtrigger.html). ## 2. Statement triggers Statement triggers are banned in GPDB7 due to concerns over its usability. The first is about the obscurity in where a trigger is fired: for DMLs like INSERT, statement trigger will be fired on QE, but for others like TRUNCATE, statement trigger will be fired on QD. Therefore, it would be difficult for a user to expect what would happen executing the function. In later releases we might provide a GUC to allow use of statement triggers for users who are aware of its limitations. For statement triggers that are fired on QEs, they would share the same limitation as the row trigger function mentioned before. In addition, options like [REFERENCING](https://www.postgresql.org/docs/current/sql-createtrigger.html#:~:text=The%20REFERENCING%20option%20enables) for the statement trigger cannot work due to unable to collect a global view of the statement on only one QE. For example: ```sql! CREATE TABLE trigtab (id INT, name TEXT); CREATE OR REPLACE FUNCTION notify_rows_inserted() RETURNS TRIGGER AS $$ DECLARE row_count INT; BEGIN -- Count the number of rows in the transition table SELECT count(*) INTO row_count FROM inserted_rows; -- Raise a notice with the number of inserted rows RAISE NOTICE 'Inserted % rows into trigtab', row_count; RETURN NULL; END; $$ LANGUAGE plpgsql; CREATE TRIGGER insert_tri AFTER INSERT ON trigtab REFERENCING NEW TABLE AS inserted_rows FOR EACH STATEMENT EXECUTE FUNCTION notify_rows_inserted(); ``` In the above example, we count the number of rows being inserted into the table using the transitional table feature. However, since QE does not have global view of the INSERT statement, we would at best just print number of rows per-QE (which might be useful in some cases, but it still differs to what a user expects from statement-level trigger and is also error-prone). ## 3. Event triggers Event triggers are always fired on QD. Therefore, it is perfectly fine to use it in GPDB. [Here are the available DDL events](https://www.postgresql.org/docs/current/event-trigger-definition.html) you can put trigger on. Event trigger has a wide range of applicability in GPDB too. The usage is similar to upstream: ```sql! CREATE OR REPLACE FUNCTION prevent_drop_table() RETURNS event_trigger AS $$ BEGIN RAISE EXCEPTION 'Dropping tables is prohibited!'; END; $$ LANGUAGE plpgsql; CREATE EVENT TRIGGER no_drop_table ON sql_drop WHEN tag IN ('DROP TABLE') EXECUTE FUNCTION prevent_drop_table(); ``` This will prevent any DROP TABLE attempt in the database: ```sql! postgres=# create table t1(a int); CREATE TABLE postgres=# drop table t1; ERROR: Dropping tables is prohibited! CONTEXT: PL/pgSQL function prevent_drop_table() line 4 at RAISE ``` Note that, currently the `table_rewrite` event won't be fired in GPDB for `ALTER TABLE ... SET WITH (REORGANIZE=true)` and `ALTER TABLE ... REPACK BY`. It also won’t be triggered by "control commands" such as CLUSTER and VACUUM, though that behavior is the same as the upsream. For all the other commands where `table_rewrite` is fired or not, refer to [the upstream doc](https://www.postgresql.org/docs/current/event-trigger-matrix.html). ## Other caveats * If the trigger function throws exception, then the command that triggers it would abort too (even it is a `AFTER ...` type of trigger). So it would be better to handle exception in the trigger function and exit gracefully. * You can recursively call trigger function which might lead to bad behavior: for example, define an INSERT trigger which calls a function that INSERT to the same table - the execution will never end.