---
tags: SA
---
# UoD Power outage 2020-10-10
## References
- [image.sc announcement](https://forum.image.sc/t/ome-resources-down-due-to-uod-outage/43957)
- [UoD services spreadsheet](https://docs.google.com/spreadsheets/d/1P6F9s9DS0bp372VQUPxCLDYKdPoREg7_KFdNzEd-C-8/edit#gid=0)
## 2020-10-15
Attending: Seb, Jason, Josh, Simon, J-M, Will, Dom, Petr, June, Frances
### Status
- Code references (Josh/Chris)
- upgrade checks (known)
- `omero import` --> download hangs
- CloudFlare, S3, etc.
- Where on the priority list?
- Shim aka new-registry (Josh/Chris)
- Lambdas: GS: Haven't paid a dollar (UoD: dollars a month)
- Storage: digital ocean postgres (multi-master) no effort
- 120-160 USD / month
- 120 GB (1 standby, auto-failover)
- Note: flatted IP table
- Steps
- Update https://github.com/ome/qa-shim
- Create database
- Spin up via terraform
- Change DNS
- **Hardware**
- Simon: was working with JD. All production VMs are back online esp. high priority. Some networking issues.
- Engineer coming this morning to fix remaining GPFS boxes. No redundancy
- Should have no downtime
- www.openmicroscopy.org
- deployed on GitHub pages
- several components missing (QA, forums, site redirects, Schemas). Causes PRs to fail validation
- Options: switch back to ome-www or keep working with GH pages (need redirect work)
- Simon: in-progress work on redirects
- J-M: GH pages seems to be the way
- Jason: main concern is that spending time now means splitting resources
- J-M: if 1-2 days away, worth investigating
- Test website and add the schemas by EOB
- List all broken
- gate.openmicroscopy.org
- Working with ports 22 + 443
- downloads.openmicroscopy.org/docs.openmicroscopy.org
- Seem to be working. Tested via several builds
- J-M: looking at it. Everything is working
- Petr: presentations **ok**
- artifacts.openmicroscopy.org
- currently redirected to GS artifactory
- some issues with artifacts
- question of whether we switch back to our own artifactory
- J-M: working on Maven Central/Sonatype
- Focusing on Bio-Formats first
- Seb: wait until we need to push artifacts to switch ?
- How many days of investigation before deciding to switch to the old workflow?
- J-M: need to sort out various issues (shared used, GPG)
- David: 1 week necessary to test various things
- learning.openmicroscopy.org
- Jason to email Paul and ask to test. Minimal login works
- Josh: only potential issue would be corruption when PSQL would shut down
- No reason to think otherwise?
- J-M: did some testing on outreach/workshop
- Josh: assuming initial outage only looked like power outage
- Simon: a priori yes
- outreach.openmicroscopy.org/workshop.openmicroscopy.org
- Petr: fully working
- demo.openmicroscopy.org/pub-omero.openmicroscopy.org
- Petr: fully tested and working (**yes** per spreadsheet)
- Seb: looking at pub-omero today
- nightshade.openmicroscopy.org
- Simon: starting the omero-server should be fine
- Josh: assuming no storage bump. Just access problem?
- Jason: nightshade mailing list?
- Simon: wait for GPFS to be back
- Josh: we're still in maintenance window
- Petr: to start drafting
- idr-redmine.openmicroscopy.org
- Frances: confirmed it's working. updating issues
- merge-ci.openmicroscopy.org/latest-ci.openmicroscopy.org
- Seb: could not ssh into the boxes
- Simon to check the DRAC
- ci.openmicroscopy.org
- running
- J-M: test these jobs if we are not switching the release workflow
- image.sc
- Seb to draft
## 2020-10-14
Attending: Will, Seb, PEtr, Dom, Josh, Simon, Frances, J-M, Jason, David,
### Status
- Hardware
- Simon: 2 arrays down. One out-of-support, one under warranty. Out-of-support one was fixed with no charge. Dell engineer or UoD IT person need to go to Data Centre (need to ask who)
- Getting VMWare back online requires UoD IT technician
- Potentially getting VMWare back except nightshade, devspaces
- Impact on current services of priority 4-5
- gate back
- downloads/docs still unavailable
- other web-prod would work
- website woudl come back
- nightshade requires GPFS fix
- demo/learning could come back
- ome-lochy (Redmine, monitoring requires GPFS for persistent storage but could potentially bring back redmine without attachments)
- Artifactory ("running")
- Josh: DNS updated from UoD to GS.
- Mirrors scijava.
- Note that actions can timeout while mirroring is taking place.
- There may be JARs not there. Caching fun
- Some changes on GS side at code level to be reviewed (bioformats2raw)
- Are we rolling back or switching to GS artifactory long-term plan
- Jason: much harder discussion. Problems we are having also affect NPSC, proteomics, MRC...
- Seb: artifactory is one of the first to come up
- Simon: need resiliency
- Seb: have 3 of them effectively (Scijava + OME + GS) containing a portion of the artifacts
- Simon: encode them in the builds to start relying on them
- JRS: need to make this decision about what is the ideal.
- JMB: pushing to maven central will also help (may be easier to push these days). More resiliency.
- JMB: can't yet build. i.e. not "fixed"
- Investigations
- See artifactory discussion above
- Website (Simon):
- website is in 2 parts. static content straighforward to move to GH pages
- get our static content on GH pages?
- Long-term could look at CDN
- JRS: minimal viable? Seb: static pages and then need things to add. Most critical is **/Schemas**
- Simon: basically same as https://snoopycrimecop.github.io/www.openmicroscopy.org/
- JRS: use one as the backup? Simon: could keep a list of DNS changes if there's another outage.
- Docs (J-m): looked at various hosting options (e.g. GitHub pages). javadoc should be automatically available via javadoc.io.
- David available to discuss Bio-Formats deployments?
- Josh: capture the impossibility to build first. J-M: to start with omero-model
- Petr: snoopy + www
### Next steps
- Simon: redirect WWW to GitHub? Seb to change the DNS.
- Simon working on redirects
- Seb working on the Schemas
- Josh: propose **registry shim** with help from Glencoe
- Simon: GDPR ok? Will check.
- J-M + David: look into artifact hosting
- Potentially include Dom for omero-* artifacts
- As soon as VMWare is up, test learning + demo + outreaches
- Simon: expecting no need to rerun the playbooks for Ansible managed systems
- Python: Josh push omero-web to readthedocs
- Seb: omero-py?
- Misc
- List of External Resources we're focusing on
- PyPI
- ReadTheDocs
- javadocs.io
- Maven Central
- GitHub Pages
- GitHub Releases
- Presentations (Petr)
- Possible to have resiliency?
- Seb: GitHub pages?
- Simon: hitting limits?
- e.g. https://ome.github.io/presentations