---
tags: research-computing, handover
---
# ARC4+ thoughts
## jh notes before tidying
I still can't get my head round the high level logic. I don't get why we'd choose to decommission ARC3 before doing ARC4+.
If our plan was delete ARC3, install ARC5 (with managed service contract), then rebuild ARC4 in the style of ARC5, I'd get it.
I'm not sure I understand the time cost with the tasks. Lustre rebuild 50 days effort. I'm missing something here, as why is that 50 days, when task 5, which appears to be setup all new scheduler infrastructure, create compute and GPU image, setup and move people over to ARC4+ is 30. I'm assuming this also implies a very loose coupling between client and server Lustre versions.
Task 7 is a task that will keep on giving.
## AC scatty notes
- As with JHs point sequencing is abit interesting, lifting and shifting users from ARC3 -> ARC4 not necessarily straightforward (lots of modules need moving, automated workflow configuration, cryoEM stuff etc)
### Task 1
> Nothing other than GPU nodes is going to be specifically re-purposed as part of this task
Not even high memory nodes? That leaves us with just 2 on ARC4 (losing 4 on ARC3)
## NR Comments/Notes
- There is mention of a series of questions that have not been answered yet, do you have a copy of the questions ?
- What is the planned life ARC4.5 - will the transplanted ARC3 GPU nodes last (warranty ?), what if they fail (can we get spare parts) ?
- How much of the work would we expect Alces to do ?
- Is single path approach wise, should we not have oppertunity to weigh up options (software/configurations) to learn what will work best for us, surveys will only get us so far.
- How much lead time do we need to notify users of ARC of downtime to minimise disruptions to Projects
- How much time/effort do we need to retrain the Research community on how to use ARC ?
- Is there consideration for the time/effort needed to rewrite existing documentation and training material ?
- Is there any provision for training staff on the new technologies ?
## MC Questions for Red Oak
- Crucially, what is the timeline for this work?
-- Is this proposing ARC3 be decommissioned before ARC4Plus is available (Task 2)?
-- If so, why?
-- How much work can be carried out in parallel or is it envisaged as a series of sequential tasks?
-- What has been assumed in terms of the time needed to get the software stack "right" and ready to accept real research jobs (with the drain down plan from ARC4 to ARC4Plus - Task 4)?
-- What is the overall clock time realistically expected to get from start to finish, allowing for skills acquisition and reasonable (declared) assumptions on procurement?
- What (in total) is the expected resource contribution from Leeds?
-- Where and which skills are required in the timeline?
-- How does it affect the "critical path"?
-- How much could be bought in and how much would need to be in-house?
- Proposed final ARC4Plus service
-- What are the assumptions made (in time, cost and impacts) between the Lustre options?
-- Earlier discussions included use of a separate provider to drive the delivery of service - is that still the assumption and, if so, what is the division of the work packages?
-- How does Task 5, 6, 7 relate to Task 4?
-- What are the assumptions about the operation of scheduling of jobs to CycleCloud?
## MTC additional thoughts
* Timeline, start and end dates (as mentioned previously)
* Lots of hidden detail within all this
* Which needs exploration and discovery
* Eg 1. How we host home directories
* Eg 2. We (currently) have no staff in place that are experienced and qualified to support any of this work. Lead time on this is likely to be > 3 months