Try   HackMD

EZID Operational Error

Issue description

  • EZID processes can't connect to the Amazon RDS (Relational Database Service) that is running MySQL and keeps receiving one of these errors.
  • EZID was down for ~3 minutes and operations resumed as soon as the RDS was up.

Observations:

  • Something terribly wrong happend at 12:22 PM (Pacific Time) on the rds-ias-ezid-search production server.

    Note: We had observed similar service failure on August 16, 2019

  • The server came back up at ~12:25 PM (Pacific Time)
  • Ways to regenerate the error:
    • not known

Errors

Error 1

OperationalError: (2013, 'Lost connection to MySQL server during query')

  • Details:

    variable value
    Error number 2013
    Symbol CR_SERVER_LOST
    Message Lost connection to MySQL server during query
  • Encountered by: EZID API/UI

  • Logs: See here

  • Potential reasons for failure:

    • Networking issues at RDS server.
    • If millions of rows are being sent as part of one or more queries.
    • Aborted connects?
      • No aborted connects were encountered by the query on PROD:
        SHOW GLOBAL STATUS LIKE 'Aborted_connects';
    • See dev.mysql.com for more details

Error 2

OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd' (111)")

  • Details:

    variable value
    Error number 2003
    Symbol CR_CONN_HOST_ERROR
    Message Can't connect to MySQL server on '%s' (%d)
  • Encountered by: EZID API/UI

  • Logs: See here

  • Potential reasons for failure:

Error 3

OperationalError: (2006, 'MySQL server has gone away')

  • Details:

    variable value
    Error number 2006
    Symbol CR_SERVER_GONE_ERROR
    Message MySQL server has gone away
  • Encountered by: EZID Link Checker

  • Logs: See Here

  • Potential reasons for failure:

    • You have encountered a bug where the server died while executing the query.
    • If you send a query to the server that is incorrect or too large.
    • See dev.mysql.com for more details.

Possible solutions?

Given that all the services are encountering issues during a specifc time frame, it might be useful to check if they are failing over any AWS RDS Events and reviewing MySQL error logs.

How To:
Check AWS RDS Events
Review MySQL error logs

Handle specific cases

If the failure occurs due to any of the following reasons:

  1. connection times out
  • If a connection times out, increase the timeouts for MySQL by increasing the wait_timeout and interactive_timeout parameters by using a custom parameter group.
  1. the query that is generating the error is retrieving a large data set
  • Increase the max_allowed_packet size parameter by using a custom parameter group.
  1. If millions of rows are being sent as part of one or more queries
  • Try increasing net_read_timeout from its default of 30 seconds to 60 seconds or longer, sufficient for the data transfer to complete.
  1. You have encountered a bug where the server died while executing the query.
  • Get the query that crashed the server and try to fix it.

Logs

EZID API/UI logs

/ezid/apache/ezid/SITE/logs/transactions.log

2019-08-22 12:22:00,199 1d3ae118c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/115059 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:00,210 1d3ae118c51211e9b6b30a9901e9a631 END BADREQUEST
2019-08-22 12:22:11,079 23b70012c51211e9b6b30a9901e9a631 BEGIN getMetadata doi:10.7284/900617 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:11,091 23b70012c51211e9b6b30a9901e9a631 END BADREQUEST
2019-08-22 12:22:31,959 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=0 operationCount=6
2019-08-22 12:22:37,637 338b7040c51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901508 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:37,649 338b7040c51211e993c60a9901e9a631 END BADREQUEST
2019-08-22 12:22:43,378 36f76dbac51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/107166 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:43,390 36f76dbac51211e9932d0a9901e9a631 END BADREQUEST
2019-08-22 12:22:48,183 39d4ad54c51211e9932d0a9901e9a631 BEGIN getMetadata doi:10.7284/904436 anonymous anonymous anonymous anonymous False
2019-08-22 12:22:48,195 39d4ad54c51211e9932d0a9901e9a631 END BADREQUEST
2019-08-22 12:22:49,393 3a8d395ac51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi
2019-08-22 12:22:49,410 3a8d395ac51211e993c60a9901e9a631 END SUCCESS 1699
2019-08-22 12:22:49,411 3a900374c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi
2019-08-22 12:22:49,415 3a900374c51211e993c60a9901e9a631 END SUCCESS 10
2019-08-22 12:22:53,155 3ccb4f54c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:22:53,405 3ccb4f54c51211e993c60a9901e9a631 END SUCCESS 0
2019-08-22 12:22:53,406 3cf19f42c51211e993c60a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:22:53,634 3cf19f42c51211e993c60a9901e9a631 END SUCCESS 0
2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 BEGIN mintIdentifier ark:/99999/fk4 merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,438 43850556c51211e993c60a9901e9a631 END SUCCESS ark:/99999/fk4ff51c9m
2019-08-22 12:23:04,439 438515e6c51211e993c60a9901e9a631 BEGIN createIdentifier ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,473 438515e6c51211e993c60a9901e9a631 END SUCCESS
2019-08-22 12:23:04,546 439577d8c51211e993c60a9901e9a631 BEGIN setMetadata ark:/99999/fk4ff51c9m merritt ark:/99166/p9td9n725 merritt ark:/99166/p9pn8xd8d
2019-08-22 12:23:04,577 439577d8c51211e993c60a9901e9a631 END SUCCESS
2019-08-22 12:23:11,315 479e652ec51211e993c60a9901e9a631 BEGIN getMetadata doi:10.7284/901440 anonymous anonymous anonymous anonymous False
2019-08-22 12:23:11,328 479e652ec51211e993c60a9901e9a631 END BADREQUEST
2019-08-22 12:23:17,725 4b704bc2c51211e993c60a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:32,739 - STATUS pid=17922 threads=22 running activeOperations=0 waitingRequests=0 activeDataciteOperations=0 updateQueueLength=0 binderQueueLength=0 dataciteQueueLength=2 crossrefQueue:archived/unsubmitted/submitted=5/0/0 downloadQueueLength=0 activeSearches=1 operationCount=12
2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:23:36,202 5673ba86c51211e986c90a9901e9a631 END SUCCESS ark:/88435/dsp011v53k084w
2019-08-22 12:23:36,202 5673cb16c51211e986c90a9901e9a631 BEGIN createIdentifier ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:23:39,296 5673cb16c51211e986c90a9901e9a631 END SUCCESS
2019-08-22 12:23:39,456 58645a1cc51211e986c90a9901e9a631 BEGIN setMetadata ark:/88435/dsp011v53k084w pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp011v53k084w
2019-08-22 12:23:43,197 58645a1cc51211e986c90a9901e9a631 END SUCCESS
2019-08-22 12:23:44,279 5b441bfac51211e9932d0a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords https://doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:44,847 5b9ac414c51211e9b6b30a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:46,689 5cb408ecc51211e9882d0a9901e9a631 BEGIN getMetadata doi:10.7284/117233 anonymous anonymous anonymous anonymous False
2019-08-22 12:23:47,616 5cb408ecc51211e9882d0a9901e9a631 END BADREQUEST
2019-08-22 12:23:48,538 5b9ac414c51211e9b6b30a9901e9a631 END SUCCESS 0
2019-08-22 12:23:48,539 5dce22b2c51211e9b6b30a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi keywords doi.org/10.5072/FK2VQ3122Q
2019-08-22 12:23:50,488 5ef7a80cc51211e986c90a9901e9a631 BEGIN search/count - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 owner invemardoi
2019-08-22 12:23:50,833 5dce22b2c51211e9b6b30a9901e9a631 END SUCCESS 0
2019-08-22 12:23:50,856 5ef7a80cc51211e986c90a9901e9a631 END SUCCESS 1699
2019-08-22 12:23:50,857 5f2fddeec51211e986c90a9901e9a631 BEGIN search/results - invemardoi ark:/99166/p9dz03470 invemar ark:/99166/p9jm23k00 -updateTime 0 10 owner invemardoi
2019-08-22 12:23:52,535 5f2fddeec51211e986c90a9901e9a631 END SUCCESS 10
2019-08-22 12:23:59,568 4b704bc2c51211e993c60a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query')
2019-08-22 12:23:59,568 5b441bfac51211e9932d0a9901e9a631 END ERROR OperationalError: (2013, 'Lost connection to MySQL server during query')
2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:00,873 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:00,873 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,864 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,910 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,928 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:05,956 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,941 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,960 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:10,991 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,917 - ERROR download._daemonThread OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,963 - ERROR backproc._backprocDaemon OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:15,994 - ERROR register_async._daemonThread/binder OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:16,026 - ERROR register_async._daemonThread/datacite OperationalError: (2003, "Can't connect to MySQL server on 'rds-ias-ezid-search-prd.cmcguhglinoa.us-west-2.rds.amazonaws.com' (111)")
2019-08-22 12:24:25,117 739bb096c51211e9934b0a9901e9a631 BEGIN mintIdentifier ark:/88435/dsp01 pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:24:25,118 739bb096c51211e9934b0a9901e9a631 END SUCCESS ark:/88435/dsp01x346d703k
2019-08-22 12:24:25,118 739bc1c6c51211e9934b0a9901e9a631 BEGIN createIdentifier ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36
2019-08-22 12:24:25,180 739bc1c6c51211e9934b0a9901e9a631 END SUCCESS
2019-08-22 12:24:25,349 73bf000ac51211e9934b0a9901e9a631 BEGIN setMetadata ark:/88435/dsp01x346d703k pudiglib ark:/99166/p9df6k49v pudiglib ark:/99166/p9j678z36 _target http://dataspace.princeton.edu/jspui/handle/88435/dsp01x346d703k
2019-08-22 12:24:25,387 73bf000ac51211e9934b0a9901e9a631 END SUCCESS
/ezid/apache/ezid/SITE/logs/link-checker.log

2019-08-22 11:11:43,928 begin processing
2019-08-22 12:01:44,933 end processing, checked 3094 links at 1.0 links/s, slept 03:46:04

2019-08-22 12:01:44,933 begin load workset

2019-08-22 12:01:45,895 end load workset, 10 owners (7 capped), 3546 links, 
500 unvisited links, 2433 good links (oldest=229d), 613 bad links (oldest=2d)

2019-08-22 12:01:45,896 begin processing

2019-08-22 12:23:59,580 unhandled exception in worker thread
  1. AWS rds-mysql-server-gone-away
  2. dev.mysql.com : MySQL server has gone away
  3. dev.mysql.com : Can't connect to MySQL server
  4. dev.mysql.com : Lost connection to MySQL server