---
title: EZID search performance
tags: mysql, sql, performance, EZID
---
# EZID Search Performance
References:
for 5.7:
* https://downloads.mysql.com/docs/mysql-perfschema-excerpt-5.7-en.pdf
* https://dev.mysql.com/doc/refman/5.7/en/
for 8.x:
* https://dev.mysql.com/doc/refman/8.0/en/performance-schema-quick-start.html
## Limiting the maxium duration of queries
To limit query execution time globally to a maximum of 30 seconds:
```sql
SET GLOBAL MAX_EXECUTION_TIME=30000;
```
Or for a session:
```sql
SET SESSION MAX_EXECUTION_TIME=30000;
```
Or for a specific query (and showing the resulting timeout error):
```sql
SELECT /*+ MAX_EXECUTION_TIME(1000) */ COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (`ezidapp_searchidentifier`.`publicSearchVisible`
AND `ezidapp_searchidentifier`.`identifier` LIKE 'ark:%');
ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded
```
:::info
**Note:** Command line mysql client must be started with `--comments` option for versions less than 5.7.7, otherwise the optimizer hints are stripped out.
:::
This can also be [enabled in Django](https://cscheng.info/2020/06/03/prevent-long-running-mysql-queries-with-max-execution-time-in-django.html), for example to limit queries issued by Django to a maximum of 15 seconds:
```
DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
...
'OPTIONS': {
'init_command': 'SET max_execution_time=15000'
}
}
}
```
The error will need to be handled by the application.
## Finding slow queries
On `ezid-dev`:
```sql
SELECT COUNT_STAR AS N, AVG_TIMER_WAIT/1000000000000 T_SEC, DIGEST_TEXT
FROM performance_schema.events_statements_summary_by_digest
WHERE SCHEMA_NAME='ezid'
AND DIGEST_TEXT LIKE 'SELECT %'
ORDER BY T_SEC ASC;
```
| N | T_SEC | DIGEST_TEXT |
| -------- | ----------- | ------------ |
| 5 | 1.8042 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 8 | 4.6502 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? ) |
| 3 | 5.0004 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 1 | 5.3192 | SELECT COUNT ( * ) FROM `ezidapp_linkchecker` `el` |
| 2 | 6.8792 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` |
| 4 | 8.8901 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND MATCH ( `keywords` ) AGAINST ( ? IN NATURAL LANGUAGE MODE ) ) |
| 1 | 9.0567 | SELECT * FROM `ezidapp_identifier` |
| 1 | 10.0002 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) |
| 1 | 10.4321 | SELECT COUNT ( * ) FROM `ezidapp_identifier` |
| 1 | 10.5973 | SELECT * FROM `ezidapp_searchidentifier` |
| 1 | 10.9956 | SELECT COUNT ( * ) FROM `ezidapp_storeidentifier` |
| 1 | 11.2573 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` `es` |
| 3 | 12.0008 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 10 | 19.2319 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) ) |
| 1 | 20.3023 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `test_identifier_publicSearchVisible` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `identifier` LIKE ? ) |
| 1 | 22.4831 | SELECT ( SELECT COUNT ( * ) FROM `ezidapp_searchdatacenter` ) AS `searchdatacenter` , ( SELECT COUNT ( * ) FROM `ezidapp_storedatacenter` ) AS `storedatacenter` , ( SELECT COUNT ( * ) FROM `ezidapp_searchgroup` ) AS `searchgroup` , ( SELECT COUNT ( * ) FROM `ezidapp_storegroup` ) AS `storegroup` , ( SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` ) AS `searchidentifier` , ( SELECT COUNT ( * ) FROM `ezidapp_storeidentifier` ) AS `storeidentifier` , ( SELECT COUNT ( * ) FROM `ezidapp_searchprofile` ) AS `searchprofile` , ( SELECT COUNT ( * ) FROM `ezidapp_storeprofile` ) AS `storeprofile` , ( SELECT COUNT ( * ) FROM `ezidapp_searchrealm` ) AS `searchrealm` , ( SELECT COUNT ( * ) FROM `ezidapp_storerealm` ) AS `storerealm` , ( SELECT COUNT ( * ) FROM `ezidapp_searchuser` ) AS `searchuser` , ( SELECT COUNT ( * ) FROM `ezidapp_storeuser` ) AS `storeuser` |
| 1 | 30.0005 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `identifier` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 1 | 30.0008 | SELECT /*+ MAX_EXECUTION_TIME (?) */ COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 13 | 77.1295 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `identifier` LIKE ? ) |
| 1 | 84.8875 | SELECT * FROM `ezidapp_searchidentifier` WHERE `keywords` LIKE ? |
| 8 | 171.6110 | SELECT SQL_NO_CACHE * FROM `backup_ezidapp_searchidentifier` |
| 1 | 182.7086 | SELECT SQL_NO_CACHE * FROM `ezidapp_linkchecker` |
| 1 | 186.9248 | SELECT `metadata` FROM `ezidapp_searchidentifier` WHERE `metadata` IS NOT NULL LIMIT ? |
| 1 | 187.8709 | SELECT `from_unixtime` ( MAX ( `updateTime` ) ) FROM `ezidapp_storeidentifier` `es` |
| 2 | 231.3731 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` `es` GROUP BY `metadata` IS NULL |
| 1 | 237.4589 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` USE INDEX ( `identifier` , `test_identifier_publicSearchVisible` ) WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE ? ) |
| 1 | 413.4580 | SELECT COUNT ( * ) FROM `ezidapp_searchidentifier` WHERE `metadata` IS NULL |
| 4 | 460.4102 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `keywords` LIKE ? ) |
| 2 | 466.2411 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `searchablePublicationYear` BETWEEN ? AND ? ) |
| 5 | 479.4587 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) |
| 1 | 882.3158 | SELECT SQL_NO_CACHE * FROM `tmp_stid2` |
| 1 | 891.1953 | SELECT SQL_NO_CACHE * FROM `ezidapp_storeidentifier` |
| 1 | 1538.2004 | SELECT SQL_NO_CACHE * FROM `ezidapp_searchidentifier` |
| 1 | 2018.2626 | SELECT `id` AS `row_id` , `cm` AS `raw_bytes` FROM `ezidapp_storeidentifier` |
| 1 | 2020.8279 | SELECT `id` AS `row_id` , `cm` AS `raw_bytes` FROM `ezidapp_searchidentifier`|
There's obviously some room for improvement.
### Production queries that take longer than five second to run
See also https://docs.google.com/spreadsheets/d/11SvPAddcrMaipCjs0vSe36CzPfpJfsSp3KJIfiFmPdA/edit#gid=0
| N | T_SEC | DIGEST_TEXT |
| -------- | ----------- | ------------ |
| 3 | 5.9490 | SELECT `ezidapp_linkchecker` . `owner_id` , COUNT (?) AS `total` , COUNT ( CASE WHEN `ezidapp_linkchecker` . `numFailures` <= ? THEN ? ELSE ? END ) AS `age1Month` , COUNT ( CASE WHEN ( `ezidapp_linkchecker` . `numFailures` > ? AND `ezidapp_linkchecker` . `numFailures` <= ? ) THEN ? ELSE ? END ) AS `age2Months` , COUNT ( CASE WHEN ( `ezidapp_linkchecker` . `numFailures` > ? AND `ezidapp_linkchecker` . `numFailures` <= ? ) THEN ? ELSE ? END ) AS `age3Months` , COUNT ( CASE WHEN `ezidapp_linkchecker` . `numFailures` > ? THEN ? ELSE ? END ) AS `older` FROM `ezidapp_linkchecker` WHERE `ezidapp_linkchecker` . `numFailures` >= ? GROUP BY `ezidapp_linkchecker` . `owner_id` ORDER BY `total` DESC |
| 49 | 6.0008 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `oaiVisible` = ? AND `ezidapp_searchidentifier` . `updateTime` > ? ) |
| 5 | 7.1761 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) |
| 887 | 7.6282 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` INNER JOIN `ezidapp_searchuser` ON ( `ezidapp_searchidentifier` . `owner_id` = `ezidapp_searchuser` . `id` ) WHERE ( `ezidapp_searchuser` . `username` = ? AND MATCH ( `ezidapp_searchidentifier` . `keywords` ) AGAINST ( ? IN BOOLEAN MODE ) ) |
| 65 | 13.8912 | SELECT COUNT ( * ) AS `__count` FROM `ezidapp_searchidentifier` WHERE ( `ezidapp_searchidentifier` . `publicSearchVisible` = ? AND `ezidapp_searchidentifier` . `identifier` LIKE BINARY ? ) |
| 42 | 156.3786 | SELECT `ezidapp_storeidentifier` . `id` , `ezidapp_storeidentifier` . `identifier` FROM `ezidapp_storeidentifier` WHERE ( `ezidapp_storeidentifier` . `identifier` LIKE BINARY ? AND `ezidapp_storeidentifier` . `createTime` <= ? ) |
## Search for identifier using UI
Search for `ark:/87925/drs1.iberian.100191`
No additional parameters.
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`publicSearchVisible` = 1
AND
`ezidapp_searchidentifier`.`identifier`
LIKE BINARY 'ark:/87925/drs1.iberian.100191%'
)
+---------+
| __count |
+---------+
| 1 |
+---------+
1 row in set (44.22 sec)
```
vs.
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`publicSearchVisible` = 1
AND
`ezidapp_searchidentifier`.`identifier`
= 'ark:/87925/drs1.iberian.100191'
)
+---------+
| __count |
+---------+
| 1 |
+---------+
1 row in set (0.00 sec)
```
```json
{
"query_block": {
"select_id": 1,
"cost_info": {
"query_cost": "2657518.57"
},
"table": {
"table_name": "ezidapp_searchidentifier",
"access_type": "ref",
"possible_keys": [
"identifier",
"ezidapp_searchidentifie_publicSearchVisible_58de9f6f00b8058e_idx",
"ezidapp_searchidentifie_publicSearchVisible_1932465b0335635c_idx",
"ezidapp_searchidentifie_publicSearchVisible_47b0a294295f5ef5_idx",
"ezidapp_searchidentifie_publicSearchVisible_1e447c57e83c8d5d_idx",
"ezidapp_searchidentifie_publicSearchVisible_47396846c619370f_idx",
"ezidapp_searchidentifie_publicSearchVisible_117042133b78a88e_idx",
"ezidapp_searchidentifier_publicSearchVisible_6807647c6d8cb52_idx",
"ezidapp_searchidentifie_publicSearchVisible_2e067bd0a9494a38_idx"
],
"key": "ezidapp_searchidentifie_publicSearchVisible_58de9f6f00b8058e_idx",
"used_key_parts": [
"publicSearchVisible"
],
"key_length": "1",
"ref": [
"const"
],
"rows_examined_per_scan": 11491968,
"rows_produced_per_join": 1276757,
"filtered": "11.11",
"using_index": true,
"cost_info": {
"read_cost": "359124.97",
"eval_cost": "255351.54",
"prefix_cost": "2657518.57",
"data_read_per_join": "9G"
},
"used_columns": [
"identifier",
"publicSearchVisible"
],
"attached_condition": "(`ezid`.`ezidapp_searchidentifier`.`identifier` like <cache>(cast('ark:/87925/drs1.iberian.100191%' as char charset binary)))"
}
}
}
```
Still really slow without the count:
```sql
SELECT identifier AS `id`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`identifier` LIKE BINARY
'ark:/87925/drs1.iberian.100191%' ) ;
+--------------------------------+
| id |
+--------------------------------+
| ark:/87925/drs1.iberian.100191 |
+--------------------------------+
1 row in set (52.18 sec)
```
Removing `BINARY` does the job:
```sql
SELECT identifier AS `id`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`identifier` LIKE
'ark:/87925/drs1.iberian.100191%' ) ;
+--------------------------------+
| id |
+--------------------------------+
| ark:/87925/drs1.iberian.100191 |
+--------------------------------+
1 row in set (0.03 sec)
```
The `BINARY` keyword ensures case sensitive comparison is performed. However, the collation of the identifier column is `ascii_bin` which is case sensitive, so it the `BINARY` keyword is completely superfluous and only serves to promote DOS.
To verify, search for the same but with case of a character changed:
```sql
SELECT identifier AS `id`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`identifier` LIKE
'ark:/87925/drs1.Iberian.100191%' ) ;
Empty set (0.00 sec)
```
Collations used in columns of the `searchidentifier` table:
```sql
SELECT TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, COLLATION_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = 'ezidapp_searchidentifier';
+--------------+--------------------------+---------------------------+--------------------+
| TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | COLLATION_NAME |
+--------------+--------------------------+---------------------------+--------------------+
| ezid | ezidapp_searchidentifier | id | NULL |
| ezid | ezidapp_searchidentifier | identifier | ascii_bin |
| ezid | ezidapp_searchidentifier | createTime | NULL |
| ezid | ezidapp_searchidentifier | updateTime | NULL |
| ezid | ezidapp_searchidentifier | status | utf8_general_ci |
| ezid | ezidapp_searchidentifier | unavailableReason | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | exported | NULL |
| ezid | ezidapp_searchidentifier | crossrefStatus | utf8_general_ci |
| ezid | ezidapp_searchidentifier | crossrefMessage | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | target | utf8_general_ci |
| ezid | ezidapp_searchidentifier | cm | NULL |
| ezid | ezidapp_searchidentifier | agentRole | utf8_general_ci |
| ezid | ezidapp_searchidentifier | isTest | NULL |
| ezid | ezidapp_searchidentifier | owner_id | NULL |
| ezid | ezidapp_searchidentifier | ownergroup_id | NULL |
| ezid | ezidapp_searchidentifier | datacenter_id | NULL |
| ezid | ezidapp_searchidentifier | profile_id | NULL |
| ezid | ezidapp_searchidentifier | searchableTarget | utf8_general_ci |
| ezid | ezidapp_searchidentifier | resourceCreator | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourceTitle | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourcePublisher | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourcePublicationDate | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | searchablePublicationYear | NULL |
| ezid | ezidapp_searchidentifier | resourceType | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | searchableResourceType | utf8_general_ci |
| ezid | ezidapp_searchidentifier | keywords | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourceCreatorPrefix | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourceTitlePrefix | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | resourcePublisherPrefix | utf8mb4_general_ci |
| ezid | ezidapp_searchidentifier | hasMetadata | NULL |
| ezid | ezidapp_searchidentifier | publicSearchVisible | NULL |
| ezid | ezidapp_searchidentifier | oaiVisible | NULL |
| ezid | ezidapp_searchidentifier | hasIssues | NULL |
| ezid | ezidapp_searchidentifier | linkIsBroken | NULL |
+--------------+--------------------------+---------------------------+--------------------+
```
## Keyword querying
A general test search in the default search box results in a search being executed against the `ezidapp_searchidentifier.keywords` column. For example, searching for "water" results in the initial query:
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`publicSearchVisible`
AND
`ezidapp_searchidentifier`.`keywords` LIKE '%water%'
);
```
This query was terminated after 2 minutes.
Using the full text index instead:
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`publicSearchVisible`
AND
MATCH(keywords) AGAINST("+water" IN NATURAL LANGUAGE MODE)
);
```
Is still a bit slow, but completes in a somewhat acceptable ~3 seconds.
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE (
`ezidapp_searchidentifier`.`publicSearchVisible` = 1
AND
MATCH (`ezidapp_searchidentifier`.`keywords`)
AGAINST ('+water' IN BOOLEAN MODE)
);
```
The full text part of the query completes quickly, about 3 sec.
The `publicSearchVisible` part is very slow:
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 );
+----------+
| __count |
+----------+
| 21210882 |
+----------+
1 row in set (30.88 sec)
```
Creating a simple index on `publicSearchVisible` reduced this to about 3sec. Hence a recommendation is to create a simple index on the `publicSearchVisible` column since this participates in every public query generated by the search UI.
## `publicSearchVisible`
Any anonymous queries include a match against the column `publicSearchVisible` for a value of 1. This query with no additional restrictions takes about 30 seconds to complete.
```sql
SELECT COUNT(*) AS `__count`
FROM `ezidapp_searchidentifier`
WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1 );
+----------+
| __count |
+----------+
| 21210882 |
+----------+
1 row in set (30.88 sec)
```
Adding a simple index for that column:
```sql
CREATE INDEX test_identifier_publicSearchVisible
ON ezidapp_searchidentifier (publicSearchVisible);
```
reduces that time to about 3 seconds:
```sql
+----------+
| __count |
+----------+
| 21208867 |
+----------+
1 row in set (3.07 sec)
```
But still doesn't help with more combination queries like searching for identifiers that start with "`ark:`".
Creating a combination index on `identifier` and `publicSearchVisible` helps a lot.
```sql
SELECT COUNT(*) AS __count FROM ezidapp_searchidentifier
WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1
AND identifier LIKE 'ark:%');
+----------+
| __count |
+----------+
| 20985129 |
+----------+
1 row in set (6 min 32.59 sec)
```
```sql
CREATE INDEX test2_identifier_publicSearchVisible
ON ezidapp_searchidentifier (publicSearchVisible, identifier);
```
```sql
SELECT COUNT(*) AS __count FROM ezidapp_searchidentifier
USE INDEX (test2_identifier_publicSearchVisible)
WHERE ( `ezidapp_searchidentifier`.`publicSearchVisible` = 1
AND identifier LIKE 'ark:%');
+----------+
| __count |
+----------+
| 20985129 |
+----------+
1 row in set (20.30 sec)
```
Though is still very slow unless the prefix has sufficient characters: