# EZID Operational Error
[TOC]
## Issue description
- EZID processes can't connect to the Amazon RDS (Relational Database Service) that is running MySQL and keeps receiving one of [these errors](#Errors).
- EZID was down for ~3 minutes and operations resumed as soon as the RDS was up.
## Observations:
- Something terribly wrong happend at 12:22 PM (Pacific Time) on the `rds-ias-ezid-search` production server.
- The memory utilization increased from 90% to 100% resulting to a server crash.
- [View RDS Metrics here](https://metrics.librato.com/s/public/azpakcmwd?duration=900&end_time=1566502200&source=%2Ards%2Aezid%2Aprd%2A)
- [View EZID Metrics here](https://metrics.librato.com/s/public/5t9drxwai?duration=3600&end_time=1566504000&source=%2Aezid-prd%2A)
> Note: We had observed similar service failure on August 16, 2019
- The server came back up at ~12:25 PM (Pacific Time)
- Ways to regenerate the error:
- ~~not known~~
## Errors
### Error 1
:::danger
OperationalError: (2013, 'Lost connection to MySQL server during query')
:::
- Details:
| variable | value |
|:------ |:----------- |
| Error number | 2013 |
| Symbol | CR_SERVER_LOST |
| Message | Lost connection to MySQL server during query |
- Encountered by: EZID API/UI
- Logs: [See here](#EZID-APIUI-logs)
- Potential reasons for failure:
- Networking issues at RDS server.
- If millions of rows are being sent as part of one or more queries.
- Aborted connects?
- No aborted connects were encountered by the query on `PROD`:
`SHOW GLOBAL STATUS LIKE 'Aborted_connects';`
- See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html) for more details
### Error 2
:::danger
OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd....' (111)")
:::
- Details:
| variable | value |
|:------ |:----------- |
| Error number | 2003 |
| Symbol | CR_CONN_HOST_ERROR |
| Message | Can't connect to MySQL server on '%s' (%d) |
- Encountered by: EZID API/UI
- Logs: [See here](#EZID-APIUI-logs)
- Potential reasons for failure:
- See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/can-not-connect-to-server.html) for more details.
### Error 3
:::danger
OperationalError: (2006, 'MySQL server has gone away')
:::
- Details:
| variable | value |
|:------ |:----------- |
| Error number | 2006 |
| Symbol | CR_SERVER_GONE_ERROR |
| Message | MySQL server has gone away |
- Encountered by: EZID Link Checker
- Logs: [See Here](#EZID-Link-Checker-logs)
- Potential reasons for failure:
- You have encountered a bug where the server died while executing the query.
- If you send a query to the server that is incorrect or too large.
- See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/gone-away.html) for more details.
## Possible solutions?
:::info
Given that all the services are encountering issues during a specifc time frame, it might be useful to check if they are failing over any **AWS RDS Events** and reviewing **MySQL error logs**.
:::
How To:
[Check AWS RDS Events](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ListEvents.html)
[Review MySQL error logs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MySQL.html#USER_LogAccess.MySQL.Errorlog)
#### Handle specific cases
If the failure occurs due to any of the following reasons:
> 1. connection times out
- If a connection times out, increase the timeouts for MySQL by increasing the **wait_timeout** and **interactive_timeout** parameters by using a custom parameter group.
> 2. the query that is generating the error is retrieving a large data set
- Increase the **max_allowed_packet** size parameter by using a custom parameter group.
> 3. If millions of rows are being sent as part of one or more queries
- Try increasing **net_read_timeout** from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete.
> 4. You have encountered a bug where the server died while executing the query.
- Get the query that crashed the server and try to fix it.
## Logs
### EZID API/UI logs
```log
/ezid/apache/ezid/SITE/logs/transactions.log
2019-08-22 12:22:00,199 1d3ae118c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/115059 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:00,210 1d3ae118c51211e9b6b30a9901e9a631 END BADREQUEST
2019-08-22 12:22:11,079 23b70012c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/900617 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:11,091 23b70012c51211e9b6b30a9901e9a631 END BADREQUEST
2019-08-22 12:22:31,959 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=0 operationCount=6
2019-08-22 12:22:37,637 338b7040c51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901508 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:37,649 338b7040c51211e993c60a9901e9a631 END BADREQUEST
2019-08-22 12:22:43,378 36f76dbac51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/107166 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:43,390 36f76dbac51211e9932d0a9901e9a631 END BADREQUEST
2019-08-22 12:22:48,183 39d4ad54c51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/904436 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:48,195 39d4ad54c51211e9932d0a9901e9a631 END BADREQUEST
2019-08-22 12:22:49,393 3a8d395ac51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi
2019-08-22 12:22:49,410 3a8d395ac51211e993c60a9901e9a631 END SUCCESS 1699
2019-08-22 12:22:49,411 3a900374c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi
2019-08-22 12:22:49,415 3a900374c51211e993c60a9901e9a631 END SUCCESS 10
2019-08-22 12:22:53,155 3ccb4f54c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:22:53,405 3ccb4f54c51211e993c60a9901e9a631 END SUCCESS 0
2019-08-22 12:22:53,406 3cf19f42c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:22:53,634 3cf19f42c51211e993c60a9901e9a631 END SUCCESS 0
2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 BEGIN mintIdentifier ark:/99999/fk4 merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 END SUCCESS ark:/99999/fk4ff51c9m
2019-08-22 12:23:04,439 438515e6c51211e993c60a9901e9a631 BEGIN createIdentifier ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,473 438515e6c51211e993c60a9901e9a631 END SUCCESS
2019-08-22 12:23:04,546 439577d8c51211e993c60a9901e9a631 BEGIN setMetadata ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,577 439577d8c51211e993c60a9901e9a631 END SUCCESS
2019-08-22 12:23:11,315 479e652ec51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901440 anonymous anonymous anonymous anonymous False
2019-08-22 12:23:11,328 479e652ec51211e993c60a9901e9a631 END BADREQUEST
2019-08-22 12:23:17,725 4b704bc2c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:32,739 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=1 operationCount=12
2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 END SUCCESS ark:/88435/dsp011v53k084w
2019-08-22 12:23:36,202 5673cb16c51211e986c90a9901e9a631 BEGIN createIdentifier ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:23:39,296 5673cb16c51211e986c90a9901e9a631 END SUCCESS
2019-08-22 12:23:39,456 58645a1cc51211e986c90a9901e9a631 BEGIN setMetadata ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp011v53k084w
2019-08-22 12:23:43,197 58645a1cc51211e986c90a9901e9a631 END SUCCESS
2019-08-22 12:23:44,279 5b441bfac51211e9932d0a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:44,847 5b9ac414c51211e9b6b30a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:46,689 5cb408ecc51211e9882d0a9901e9a631 BEGIN getMetadata doi:10.7284/117233 anonymous anonymous anonymous anonymous False
2019-08-22 12:23:47,616 5cb408ecc51211e9882d0a9901e9a631 END BADREQUEST
2019-08-22 12:23:48,538 5b9ac414c51211e9b6b30a9901e9a631 END SUCCESS 0
2019-08-22 12:23:48,539 5dce22b2c51211e9b6b30a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:50,488 5ef7a80cc51211e986c90a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi
2019-08-22 12:23:50,833 5dce22b2c51211e9b6b30a9901e9a631 END SUCCESS 0
2019-08-22 12:23:50,856 5ef7a80cc51211e986c90a9901e9a631 END SUCCESS 1699
2019-08-22 12:23:50,857 5f2fddeec51211e986c90a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi
2019-08-22 12:23:52,535 5f2fddeec51211e986c90a9901e9a631 END SUCCESS 10
2019-08-22 12:23:59,568 4b704bc2c51211e993c60a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query')
2019-08-22 12:23:59,568 5b441bfac51211e9932d0a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query')
2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:00,873 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,864 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,910 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,928 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,956 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,941 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,960 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,991 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,917 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,963 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,994 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:16,026 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:25,117 739bb096c51211e9934b0a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:24:25,118 739bb096c51211e9934b0a9901e9a631 END SUCCESS ark:/88435/dsp01x346d703k
2019-08-22 12:24:25,118 739bc1c6c51211e9934b0a9901e9a631 BEGIN createIdentifier ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:24:25,180 739bc1c6c51211e9934b0a9901e9a631 END SUCCESS
2019-08-22 12:24:25,349 73bf000ac51211e9934b0a9901e9a631 BEGIN setMetadata ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp01x346d703k
2019-08-22 12:24:25,387 73bf000ac51211e9934b0a9901e9a631 END SUCCESS
```
### EZID Link Checker logs
```log
/ezid/apache/ezid/SITE/logs/link-checker.log
2019-08-22 11:11:43,928 begin processing
2019-08-22 12:01:44,933 end processing, checked 3094 links at 1.0 links/s, slept 03:46:04
2019-08-22 12:01:44,933 begin load workset
2019-08-22 12:01:45,895 end load workset, 10 owners (7 capped), 3546 links,
500 unvisited links, 2433 good links (oldest=229d), 613 bad links (oldest=2d)
2019-08-22 12:01:45,896 begin processing
2019-08-22 12:23:59,580 unhandled exception in worker thread
```
## Useful links:
1. [AWS rds-mysql-server-gone-away ](https://aws.amazon.com/premiumsupport/knowledge-center/rds-mysql-server-gone-away/)
2. [dev.mysql.com : MySQL server has gone away ](https://dev.mysql.com/doc/refman/5.7/en/gone-away.html)
3. [dev.mysql.com : Can't connect to MySQL server ](https://dev.mysql.com/doc/refman/5.7/en/can-not-connect-to-server.html)
4. [dev.mysql.com : Lost connection to MySQL server ](https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html)