Using async_calls for cron jobs

# Using async_calls for cron jobs Description of the current implementation is below. The general idea is to refactor cron jobs so that the execution occors by the same mechanism as async calls. This can be done by adding a `cron_job_key` to `async_calls` and removing the state fields in `cron_jobs` or the entire table. ## Proposal 1: Use the `async_calls` db table to manage the state of cron jobs by adding a `cron_job_key` field to `async_calls` and remove these fields from the `cron_jobs` table: `deployment_id`, `state`. A cron sync job will run periodically in the cronjob service that will: - Get all rows in the `cron_jobs` table that have `start_time > now` and `next_execution < now + 1h` - Find all async jobs with that `cron_job_key` whose state is `pending` and `scheduled_at > now` - Filter the `cron_jobs` rows to find the set that have a next_execution time in the next 1h but are not scheduled in the `async_jobs` table - Create `async_calls` rows for the unscheduled cron_jobs, and set `scheduled_at` to `next_execution`, and determine the next `next_execution` time - Update the `cron_jobs` table with the `next_execution` time - Cycle through the loop at the next upcoming `scheduled_at` Open questions: - Multiple controllers may race - Continue using the hashring? - Maybe only one “lead” controller should be doing all the above? - Lock on the `cron_jobs` table and allow multiple controllers? ## Proposal 2: Use the `async_calls` db table to manage the state of cron jobs by adding a `cron_job_key`, and remove the `cron_jobs` table entirely. A cron sync job will run periodically in the cronjob service, given the cron jobs that the controller finds in the schema. Similar to the above, but using in-memory state instead of the `cron_jobs` table. Open questions: - Do we need the `cron_jobs` table for anything? Console? - How do we avoid multiple controllers racing? ## Proposal 3: Do proposal 1, but instead of keeping a `cron_job_key` in the `async_calls` table, we'll store the `async_call_id` in the `cron_jobs` table. So the process would change to: - Retrieve the last executed call for each cron job - Compute the `next_execution` time based on the last `scheduled_at` - Sleep until the earliest execution ## Other proposals? _open to ideas..._ ## Current implementation _This is to help with my understanding of the implementation but might be useful to others._ When creating a deployment, the controller adds rows to `cron_jobs` for each cron job it finds in the schema. For each job, it sets the `start_time` to the current time, and the `next_execution` time to the next valid execution time (i.e. Monday 9am, or a minute from now). When a deployment is replaced, the controller’s cronjob service to publishes a `syncEvent` with all the jobs in the `cron_jobs` db table and the new `deploymentKey`. The `syncEvent` is also published every minute by the cronjob service, along with a `killOldJobs` event. Other events that are published are an `endedJobsEvent` when a job is finished executing or when it’s killed by `killOldJobs`, and an `updatedHashRingEvent` which is published every 5s to synchronize the hash ring with the active controllers if the active controller list has changed. Finally, the cronjob service’s `watchForUpdates` function subscribes to the above events and maintains state around the jobs it knows about, those that it’s currently executing, and when it should attempt to execute the next job. - When the `next` timer has elapsed, it picks up jobs that are able to be executed (including checking the hashring to avoid picking up jobs if there are enough other controllers to spread the load to), and it attempts to mark these jobs as `executing` in the `cron_jobs` db table. - For each job it was able to update to an `executing` state, if the deployment has more than 0 min replicas, the controller starts executing the job. If a job’s deployment’s min replicas are set to 0, it is removed from the controller’s list of jobs. Jobs that are not in the executing state are removed from the controller’s list of currently executing jobs. - When an `endedJobsEvent` is received, as above, jobs that aren’t in the executing state are removed from the list of currently executing jobs. - When a `syncEvent` is received, the cronjob service’s job list is updated with the list of jobs in the even, the jobs that are not in the `executing` state are removed from the list of executing jobs, and the `newJobs` list is updated with the jobs who’s `deploymentKey` matches that in the event. The `newJobs` list doesn’t appear to be used. - When an `updatedHashRingEvent` is received, it’s a no-op which causes the loop to be cycled to see if new jobs need to be scheduled.