# EZID Operational Error [TOC] ## Issue description - EZID processes can't connect to the Amazon RDS (Relational Database Service) that is running MySQL and keeps receiving one of [these errors](#Errors). - EZID was down for ~3 minutes and operations resumed as soon as the RDS was up. ## Observations: - Something terribly wrong happend at 12:22 PM (Pacific Time) on the `rds-ias-ezid-search` production server. - The memory utilization increased from 90% to 100% resulting to a server crash. - [View RDS Metrics here](https://metrics.librato.com/s/public/azpakcmwd?duration=900&end_time=1566502200&source=%2Ards%2Aezid%2Aprd%2A) - [View EZID Metrics here](https://metrics.librato.com/s/public/5t9drxwai?duration=3600&end_time=1566504000&source=%2Aezid-prd%2A) > Note: We had observed similar service failure on August 16, 2019 - The server came back up at ~12:25 PM (Pacific Time) - Ways to regenerate the error: - ~~not known~~ ## Errors ### Error 1 :::danger OperationalError: (2013, 'Lost connection to MySQL server during query') ::: - Details: | variable | value | |:------ |:----------- | | Error number | 2013 | | Symbol | CR_SERVER_LOST | | Message | Lost connection to MySQL server during query | - Encountered by: EZID API/UI - Logs: [See here](#EZID-APIUI-logs) - Potential reasons for failure: - Networking issues at RDS server. - If millions of rows are being sent as part of one or more queries. - Aborted connects? - No aborted connects were encountered by the query on `PROD`: `SHOW GLOBAL STATUS LIKE 'Aborted_connects';` - See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html) for more details ### Error 2 :::danger OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd....' (111)") ::: - Details: | variable | value | |:------ |:----------- | | Error number | 2003 | | Symbol | CR_CONN_HOST_ERROR | | Message | Can't connect to MySQL server on '%s' (%d) | - Encountered by: EZID API/UI - Logs: [See here](#EZID-APIUI-logs) - Potential reasons for failure: - See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/can-not-connect-to-server.html) for more details. ### Error 3 :::danger OperationalError: (2006, 'MySQL server has gone away') ::: - Details: | variable | value | |:------ |:----------- | | Error number | 2006 | | Symbol | CR_SERVER_GONE_ERROR | | Message | MySQL server has gone away | - Encountered by: EZID Link Checker - Logs: [See Here](#EZID-Link-Checker-logs) - Potential reasons for failure: - You have encountered a bug where the server died while executing the query. - If you send a query to the server that is incorrect or too large. - See [dev.mysql.com](https://dev.mysql.com/doc/refman/5.7/en/gone-away.html) for more details. ## Possible solutions? :::info Given that all the services are encountering issues during a specifc time frame, it might be useful to check if they are failing over any **AWS RDS Events** and reviewing **MySQL error logs**. ::: How To: [Check AWS RDS Events](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ListEvents.html) [Review MySQL error logs](https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_LogAccess.Concepts.MySQL.html#USER_LogAccess.MySQL.Errorlog) #### Handle specific cases If the failure occurs due to any of the following reasons: > 1. connection times out - If a connection times out, increase the timeouts for MySQL by increasing the **wait_timeout** and **interactive_timeout** parameters by using a custom parameter group. > 2. the query that is generating the error is retrieving a large data set - Increase the **max_allowed_packet** size parameter by using a custom parameter group. > 3. If millions of rows are being sent as part of one or more queries - Try increasing **net_read_timeout** from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete. > 4. You have encountered a bug where the server died while executing the query. - Get the query that crashed the server and try to fix it. ## Logs ### EZID API/UI logs ```log /ezid/apache/ezid/SITE/logs/transactions.log 2019-08-22 12:22:00,199 1d3ae118c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/115059 anonymous anonymous anonymous anonymous False 2019-08-22 12:22:00,210 1d3ae118c51211e9b6b30a9901e9a631 END BADREQUEST 2019-08-22 12:22:11,079 23b70012c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/900617 anonymous anonymous anonymous anonymous False 2019-08-22 12:22:11,091 23b70012c51211e9b6b30a9901e9a631 END BADREQUEST 2019-08-22 12:22:31,959 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=0 operationCount=6 2019-08-22 12:22:37,637 338b7040c51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901508 anonymous anonymous anonymous anonymous False 2019-08-22 12:22:37,649 338b7040c51211e993c60a9901e9a631 END BADREQUEST 2019-08-22 12:22:43,378 36f76dbac51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/107166 anonymous anonymous anonymous anonymous False 2019-08-22 12:22:43,390 36f76dbac51211e9932d0a9901e9a631 END BADREQUEST 2019-08-22 12:22:48,183 39d4ad54c51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/904436 anonymous anonymous anonymous anonymous False 2019-08-22 12:22:48,195 39d4ad54c51211e9932d0a9901e9a631 END BADREQUEST 2019-08-22 12:22:49,393 3a8d395ac51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi 2019-08-22 12:22:49,410 3a8d395ac51211e993c60a9901e9a631 END SUCCESS 1699 2019-08-22 12:22:49,411 3a900374c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi 2019-08-22 12:22:49,415 3a900374c51211e993c60a9901e9a631 END SUCCESS 10 2019-08-22 12:22:53,155 3ccb4f54c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:22:53,405 3ccb4f54c51211e993c60a9901e9a631 END SUCCESS 0 2019-08-22 12:22:53,406 3cf19f42c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:22:53,634 3cf19f42c51211e993c60a9901e9a631 END SUCCESS 0 2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 BEGIN mintIdentifier ark:/99999/fk4 merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d 2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 END SUCCESS ark:/99999/fk4ff51c9m 2019-08-22 12:23:04,439 438515e6c51211e993c60a9901e9a631 BEGIN createIdentifier ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d 2019-08-22 12:23:04,473 438515e6c51211e993c60a9901e9a631 END SUCCESS 2019-08-22 12:23:04,546 439577d8c51211e993c60a9901e9a631 BEGIN setMetadata ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d 2019-08-22 12:23:04,577 439577d8c51211e993c60a9901e9a631 END SUCCESS 2019-08-22 12:23:11,315 479e652ec51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901440 anonymous anonymous anonymous anonymous False 2019-08-22 12:23:11,328 479e652ec51211e993c60a9901e9a631 END BADREQUEST 2019-08-22 12:23:17,725 4b704bc2c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:23:32,739 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=1 operationCount=12 2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 END SUCCESS ark:/88435/dsp011v53k084w 2019-08-22 12:23:36,202 5673cb16c51211e986c90a9901e9a631 BEGIN createIdentifier ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 2019-08-22 12:23:39,296 5673cb16c51211e986c90a9901e9a631 END SUCCESS 2019-08-22 12:23:39,456 58645a1cc51211e986c90a9901e9a631 BEGIN setMetadata ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp011v53k084w 2019-08-22 12:23:43,197 58645a1cc51211e986c90a9901e9a631 END SUCCESS 2019-08-22 12:23:44,279 5b441bfac51211e9932d0a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:23:44,847 5b9ac414c51211e9b6b30a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:23:46,689 5cb408ecc51211e9882d0a9901e9a631 BEGIN getMetadata doi:10.7284/117233 anonymous anonymous anonymous anonymous False 2019-08-22 12:23:47,616 5cb408ecc51211e9882d0a9901e9a631 END BADREQUEST 2019-08-22 12:23:48,538 5b9ac414c51211e9b6b30a9901e9a631 END SUCCESS 0 2019-08-22 12:23:48,539 5dce22b2c51211e9b6b30a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q 2019-08-22 12:23:50,488 5ef7a80cc51211e986c90a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi 2019-08-22 12:23:50,833 5dce22b2c51211e9b6b30a9901e9a631 END SUCCESS 0 2019-08-22 12:23:50,856 5ef7a80cc51211e986c90a9901e9a631 END SUCCESS 1699 2019-08-22 12:23:50,857 5f2fddeec51211e986c90a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi 2019-08-22 12:23:52,535 5f2fddeec51211e986c90a9901e9a631 END SUCCESS 10 2019-08-22 12:23:59,568 4b704bc2c51211e993c60a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query') 2019-08-22 12:23:59,568 5b441bfac51211e9932d0a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query') 2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:00,873 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:05,864 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:05,910 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:05,928 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:05,956 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:10,941 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:10,960 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:10,991 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:15,917 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:15,963 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:15,994 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:16,026 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)") 2019-08-22 12:24:25,117 739bb096c51211e9934b0a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 2019-08-22 12:24:25,118 739bb096c51211e9934b0a9901e9a631 END SUCCESS ark:/88435/dsp01x346d703k 2019-08-22 12:24:25,118 739bc1c6c51211e9934b0a9901e9a631 BEGIN createIdentifier ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 2019-08-22 12:24:25,180 739bc1c6c51211e9934b0a9901e9a631 END SUCCESS 2019-08-22 12:24:25,349 73bf000ac51211e9934b0a9901e9a631 BEGIN setMetadata ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp01x346d703k 2019-08-22 12:24:25,387 73bf000ac51211e9934b0a9901e9a631 END SUCCESS ``` ### EZID Link Checker logs ```log /ezid/apache/ezid/SITE/logs/link-checker.log 2019-08-22 11:11:43,928 begin processing 2019-08-22 12:01:44,933 end processing, checked 3094 links at 1.0 links/s, slept 03:46:04 2019-08-22 12:01:44,933 begin load workset 2019-08-22 12:01:45,895 end load workset, 10 owners (7 capped), 3546 links, 500 unvisited links, 2433 good links (oldest=229d), 613 bad links (oldest=2d) 2019-08-22 12:01:45,896 begin processing 2019-08-22 12:23:59,580 unhandled exception in worker thread ``` ## Useful links: 1. [AWS rds-mysql-server-gone-away ](https://aws.amazon.com/premiumsupport/knowledge-center/rds-mysql-server-gone-away/) 2. [dev.mysql.com : MySQL server has gone away ](https://dev.mysql.com/doc/refman/5.7/en/gone-away.html) 3. [dev.mysql.com : Can't connect to MySQL server ](https://dev.mysql.com/doc/refman/5.7/en/can-not-connect-to-server.html) 4. [dev.mysql.com : Lost connection to MySQL server ](https://dev.mysql.com/doc/refman/5.7/en/error-lost-connection.html)