--- title: EZID search performance tags: mysql, sql, performance, EZID --- # EZID Search Performance References: for 5.7: * https://downloads.mysql.com/docs/mysql-perfschema-excerpt-5.7-en.pdf * https://dev.mysql.com/doc/refman/5.7/en/ for 8.x: * https://dev.mysql.com/doc/refman/8.0/en/performance-schema-quick-start.html ## Limiting the maxium duration of queries To limit query execution time globally to a maximum of 30 seconds: ```sql SET GLOBAL MAX_EXECUTION_TIME=30000; ``` Or for a session: ```sql SET SESSION MAX_EXECUTION_TIME=30000; ``` Or for a specific query (and showing the resulting timeout error): ```sql SELECT /*+ MAX_EXECUTION_TIME(1000) */ COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE (`ezidapp_searchidentifier`.`publicSearchVisible` AND `ezidapp_searchidentifier`.`identifier` LIKE 'ark:%'); ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded ``` :::info **Note:** Command line mysql client must be started with `--comments` option for versions less than 5.7.7, otherwise the optimizer hints are stripped out. ::: This can also be [enabled in Django](https://cscheng.info/2020/06/03/prevent-long-running-mysql-queries-with-max-execution-time-in-django.html), for example to limit queries issued by Django to a maximum of 15 seconds: ``` DATABASES = { 'default': { 'ENGINE': 'django.db.backends.mysql', ... 'OPTIONS': { 'init_command': 'SET max_execution_time=15000' } } } ``` The error will need to be handled by the application. ## Finding slow queries On `ezid-dev`: ```sql SELECT COUNT_STAR AS N, AVG_TIMER_WAIT/1000000000000 T_SEC, DIGEST_TEXT FROM performance_schema.events_statements_summary_by_digest WHERE SCHEMA_NAME='ezid' AND DIGEST_TEXT LIKE 'SELECT %' ORDER BY T_SEC ASC; ``` | N | T_SEC | DIGEST_TEXT | | -------- | ----------- | ------------ | | 5 | 1.8042 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 8 | 4.6502 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? ) | | 3 | 5.0004 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 1 | 5.3192 | SELECT COUNT ( * ) FROM `ezidapp_linkchecker` `el` | | 2 | 6.8792 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` | | 4 | 8.8901 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND MATCH ( `keywords` ) AGAINST ( ? IN NATURAL LANGUAGE MODE ) ) | | 1 | 9.0567 | SELECT * FROM `ezidapp_identifier` | | 1 | 10.0002 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) | | 1 | 10.4321 | SELECT COUNT ( * ) FROM `ezidapp_identifier` | | 1 | 10.5973 | SELECT * FROM `ezidapp_searchidentifier` | | 1 | 10.9956 | SELECT COUNT ( * ) FROM `ezidapp_storeidentifier` | | 1 | 11.2573 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` `es` | | 3 | 12.0008 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 10 | 19.2319 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) ) | | 1 | 20.3023 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `test_identifier_publicSearchVisible` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `identifier` LIKE ? ) | | 1 | 22.4831 | SELECT ( SELECT COUNT ( * ) FROM `ezidapp_searchdatacenter` ) AS `searchdatacenter` , ( SELECT COUNT ( * ) FROM `ezidapp_storedatacenter` ) AS `storedatacenter` , ( SELECT COUNT ( * ) FROM `ezidapp_searchgroup` ) AS `searchgroup` , ( SELECT COUNT ( * ) FROM `ezidapp_storegroup` ) AS `storegroup` , ( SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` ) AS `searchidentifier` , ( SELECT COUNT ( * ) FROM `ezidapp_storeidentifier` ) AS `storeidentifier` , ( SELECT COUNT ( * ) FROM `ezidapp_searchprofile` ) AS `searchprofile` , ( SELECT COUNT ( * ) FROM `ezidapp_storeprofile` ) AS `storeprofile` , ( SELECT COUNT ( * ) FROM `ezidapp_searchrealm` ) AS `searchrealm` , ( SELECT COUNT ( * ) FROM `ezidapp_storerealm` ) AS `storerealm` , ( SELECT COUNT ( * ) FROM `ezidapp_searchuser` ) AS `searchuser` , ( SELECT COUNT ( * ) FROM `ezidapp_storeuser` ) AS `storeuser` | | 1 | 30.0005 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `identifier` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 1 | 30.0008 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 13 | 77.1295 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `identifier` LIKE ? ) | | 1 | 84.8875 | SELECT * FROM `ezidapp_searchidentifier` WHERE `keywords` LIKE ? | | 8 | 171.6110 | SELECT SQL_NO_CACHE * FROM `backup_ezidapp_searchidentifier` | | 1 | 182.7086 | SELECT SQL_NO_CACHE * FROM `ezidapp_linkchecker` | | 1 | 186.9248 | SELECT `metadata` FROM `ezidapp_searchidentifier` WHERE `metadata` IS NOT NULL LIMIT ? | | 1 | 187.8709 | SELECT `from_unixtime` ( MAX ( `updateTime` ) ) FROM `ezidapp_storeidentifier` `es` | | 2 | 231.3731 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` `es` GROUP BY `metadata` IS NULL | | 1 | 237.4589 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `identifier` , `test_identifier_publicSearchVisible` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) | | 1 | 413.4580 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` WHERE `metadata` IS NULL | | 4 | 460.4102 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `keywords` LIKE ? ) | | 2 | 466.2411 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `searchablePublicationYear` BETWEEN ? AND ? ) | | 5 | 479.4587 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) | | 1 | 882.3158 | SELECT SQL_NO_CACHE * FROM `tmp_stid2` | | 1 | 891.1953 | SELECT SQL_NO_CACHE * FROM `ezidapp_storeidentifier` | | 1 | 1538.2004 | SELECT SQL_NO_CACHE * FROM `ezidapp_searchidentifier` | | 1 | 2018.2626 | SELECT `id` AS `row_id` , `cm` AS `raw_bytes` FROM `ezidapp_storeidentifier` | | 1 | 2020.8279 | SELECT `id` AS `row_id` , `cm` AS `raw_bytes` FROM `ezidapp_searchidentifier`| There's obviously some room for improvement. ### Production queries that take longer than five second to run See also https://docs.google.com/spreadsheets/d/11SvPAddcrMaipCjs0vSe36CzPfpJfsSp3KJIfiFmPdA/edit#gid=0 | N | T_SEC | DIGEST_TEXT | | -------- | ----------- | ------------ | | 3 | 5.9490 | SELECT `ezidapp_linkchecker` . `owner_id` , COUNT (?) AS `total` , COUNT ( CASE WHEN `ezidapp_linkchecker` . `numFailures` <= ? THEN ? ELSE ? END ) AS `age1Month` , COUNT ( CASE WHEN ( `ezidapp_linkchecker` . `numFailures` > ? AND `ezidapp_linkchecker` . `numFailures` <= ? ) THEN ? ELSE ? END ) AS `age2Months` , COUNT ( CASE WHEN ( `ezidapp_linkchecker` . `numFailures` > ? AND `ezidapp_linkchecker` . `numFailures` <= ? ) THEN ? ELSE ? END ) AS `age3Months` , COUNT ( CASE WHEN `ezidapp_linkchecker` . `numFailures` > ? THEN ? ELSE ? END ) AS `older` FROM `ezidapp_linkchecker` WHERE `ezidapp_linkchecker` . `numFailures` >= ? GROUP BY `ezidapp_linkchecker` . `owner_id` ORDER BY `total` DESC | | 49 | 6.0008 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `oaiVisible` = ? AND `ezidapp_searchidentifier` . `updateTime` > ? ) | | 5 | 7.1761 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) | | 887 | 7.6282 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` INNER JOIN `ezidapp_searchuser` ON ( `ezidapp_searchidentifier` . `owner_id` = `ezidapp_searchuser` . `id` ) WHERE ( `ezidapp_searchuser` . `username` = ? AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) ) | | 65 | 13.8912 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) | | 42 | 156.3786 | SELECT `ezidapp_storeidentifier` . `id` , `ezidapp_storeidentifier` . `identifier` FROM `ezidapp_storeidentifier` WHERE ( `ezidapp_storeidentifier` . `identifier` LIKE BINARY ? AND `ezidapp_storeidentifier` . `createTime` <= ? ) | ## Search for identifier using UI Search for `ark:/87925/drs1.iberian.100191` No additional parameters. ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 AND `ezidapp_searchidentifier`.`identifier` LIKE BINARY 'ark:/87925/drs1.iberian.100191%' ) +---------+ | __count | +---------+ | 1 | +---------+ 1 row in set (44.22 sec) ``` vs. ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 AND `ezidapp_searchidentifier`.`identifier` = 'ark:/87925/drs1.iberian.100191' ) +---------+ | __count | +---------+ | 1 | +---------+ 1 row in set (0.00 sec) ``` ```json { "query_block": { "select_id": 1, "cost_info": { "query_cost": "2657518.57" }, "table": { "table_name": "ezidapp_searchidentifier", "access_type": "ref", "possible_keys": [ "identifier", "ezidapp_searchidentifie_publicSearchVisible_58de9f6f00b8058e_idx", "ezidapp_searchidentifie_publicSearchVisible_1932465b0335635c_idx", "ezidapp_searchidentifie_publicSearchVisible_47b0a294295f5ef5_idx", "ezidapp_searchidentifie_publicSearchVisible_1e447c57e83c8d5d_idx", "ezidapp_searchidentifie_publicSearchVisible_47396846c619370f_idx", "ezidapp_searchidentifie_publicSearchVisible_117042133b78a88e_idx", "ezidapp_searchidentifier_publicSearchVisible_6807647c6d8cb52_idx", "ezidapp_searchidentifie_publicSearchVisible_2e067bd0a9494a38_idx" ], "key": "ezidapp_searchidentifie_publicSearchVisible_58de9f6f00b8058e_idx", "used_key_parts": [ "publicSearchVisible" ], "key_length": "1", "ref": [ "const" ], "rows_examined_per_scan": 11491968, "rows_produced_per_join": 1276757, "filtered": "11.11", "using_index": true, "cost_info": { "read_cost": "359124.97", "eval_cost": "255351.54", "prefix_cost": "2657518.57", "data_read_per_join": "9G" }, "used_columns": [ "identifier", "publicSearchVisible" ], "attached_condition": "(`ezid`.`ezidapp_searchidentifier`.`identifier` like <cache>(cast('ark:/87925/drs1.iberian.100191%' as char charset binary)))" } } } ``` Still really slow without the count: ```sql SELECT identifier AS `id` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`identifier` LIKE BINARY 'ark:/87925/drs1.iberian.100191%' ) ; +--------------------------------+ | id | +--------------------------------+ | ark:/87925/drs1.iberian.100191 | +--------------------------------+ 1 row in set (52.18 sec) ``` Removing `BINARY` does the job: ```sql SELECT identifier AS `id` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`identifier` LIKE 'ark:/87925/drs1.iberian.100191%' ) ; +--------------------------------+ | id | +--------------------------------+ | ark:/87925/drs1.iberian.100191 | +--------------------------------+ 1 row in set (0.03 sec) ``` The `BINARY` keyword ensures case sensitive comparison is performed. However, the collation of the identifier column is `ascii_bin` which is case sensitive, so it the `BINARY` keyword is completely superfluous and only serves to promote DOS. To verify, search for the same but with case of a character changed: ```sql SELECT identifier AS `id` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`identifier` LIKE 'ark:/87925/drs1.Iberian.100191%' ) ; Empty set (0.00 sec) ``` Collations used in columns of the `searchidentifier` table: ```sql SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME FROM INFORMATION_SCHEMA.COLUMNS WHERE TABLE_NAME = 'ezidapp_searchidentifier'; +--------------+--------------------------+---------------------------+--------------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | COLLATION_NAME | +--------------+--------------------------+---------------------------+--------------------+ | ezid | ezidapp_searchidentifier | id | NULL | | ezid | ezidapp_searchidentifier | identifier | ascii_bin | | ezid | ezidapp_searchidentifier | createTime | NULL | | ezid | ezidapp_searchidentifier | updateTime | NULL | | ezid | ezidapp_searchidentifier | status | utf8_general_ci | | ezid | ezidapp_searchidentifier | unavailableReason | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | exported | NULL | | ezid | ezidapp_searchidentifier | crossrefStatus | utf8_general_ci | | ezid | ezidapp_searchidentifier | crossrefMessage | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | target | utf8_general_ci | | ezid | ezidapp_searchidentifier | cm | NULL | | ezid | ezidapp_searchidentifier | agentRole | utf8_general_ci | | ezid | ezidapp_searchidentifier | isTest | NULL | | ezid | ezidapp_searchidentifier | owner_id | NULL | | ezid | ezidapp_searchidentifier | ownergroup_id | NULL | | ezid | ezidapp_searchidentifier | datacenter_id | NULL | | ezid | ezidapp_searchidentifier | profile_id | NULL | | ezid | ezidapp_searchidentifier | searchableTarget | utf8_general_ci | | ezid | ezidapp_searchidentifier | resourceCreator | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourceTitle | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourcePublisher | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourcePublicationDate | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | searchablePublicationYear | NULL | | ezid | ezidapp_searchidentifier | resourceType | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | searchableResourceType | utf8_general_ci | | ezid | ezidapp_searchidentifier | keywords | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourceCreatorPrefix | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourceTitlePrefix | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | resourcePublisherPrefix | utf8mb4_general_ci | | ezid | ezidapp_searchidentifier | hasMetadata | NULL | | ezid | ezidapp_searchidentifier | publicSearchVisible | NULL | | ezid | ezidapp_searchidentifier | oaiVisible | NULL | | ezid | ezidapp_searchidentifier | hasIssues | NULL | | ezid | ezidapp_searchidentifier | linkIsBroken | NULL | +--------------+--------------------------+---------------------------+--------------------+ ``` ## Keyword querying A general test search in the default search box results in a search being executed against the `ezidapp_searchidentifier.keywords` column. For example, searching for "water" results in the initial query: ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` AND `ezidapp_searchidentifier`.`keywords` LIKE '%water%' ); ``` This query was terminated after 2 minutes. Using the full text index instead: ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` AND MATCH(keywords) AGAINST("+water" IN NATURAL LANGUAGE MODE) ); ``` Is still a bit slow, but completes in a somewhat acceptable ~3 seconds. ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 AND MATCH (`ezidapp_searchidentifier`.`keywords`) AGAINST ('+water' IN BOOLEAN MODE) ); ``` The full text part of the query completes quickly, about 3 sec. The `publicSearchVisible` part is very slow: ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 ); +----------+ | __count | +----------+ | 21210882 | +----------+ 1 row in set (30.88 sec) ``` Creating a simple index on `publicSearchVisible` reduced this to about 3sec. Hence a recommendation is to create a simple index on the `publicSearchVisible` column since this participates in every public query generated by the search UI. ## `publicSearchVisible` Any anonymous queries include a match against the column `publicSearchVisible` for a value of 1. This query with no additional restrictions takes about 30 seconds to complete. ```sql SELECT COUNT(*) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 ); +----------+ | __count | +----------+ | 21210882 | +----------+ 1 row in set (30.88 sec) ``` Adding a simple index for that column: ```sql CREATE INDEX test_identifier_publicSearchVisible ON ezidapp_searchidentifier (publicSearchVisible); ``` reduces that time to about 3 seconds: ```sql +----------+ | __count | +----------+ | 21208867 | +----------+ 1 row in set (3.07 sec) ``` But still doesn't help with more combination queries like searching for identifiers that start with "`ark:`". Creating a combination index on `identifier` and `publicSearchVisible` helps a lot. ```sql SELECT COUNT(*) AS __count FROM ezidapp_searchidentifier WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 AND identifier LIKE 'ark:%'); +----------+ | __count | +----------+ | 20985129 | +----------+ 1 row in set (6 min 32.59 sec) ``` ```sql CREATE INDEX test2_identifier_publicSearchVisible ON ezidapp_searchidentifier (publicSearchVisible, identifier); ``` ```sql SELECT COUNT(*) AS __count FROM ezidapp_searchidentifier USE INDEX (test2_identifier_publicSearchVisible) WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 AND identifier LIKE 'ark:%'); +----------+ | __count | +----------+ | 20985129 | +----------+ 1 row in set (20.30 sec) ``` Though is still very slow unless the prefix has sufficient characters: