ICAT: E-Logbooks Discussion

# ICAT: E-Logbooks Discussion ## Open questions - What do users want - At least one of the ESRF's users described their requirements - these were already covered by their solution - Horses for courses - one solution may not make everyone happy - How many are needed? ## Other off the shelf solutions Want to limit the number of solutions that need to be maintained, but suspect it's more than one. - Elab-FTW - NOMAD - Ability to export ICAT metadata to NOMAD might be in the future for HZB - ESRF: more useful for analysis, not going to replace the Logbook - Jupyter enabled / analysis solutions - ESRF have bespoke solution in DataHub, running with Slurm (somehow?) - Opens a new tab in browser - No direct integration to ICAT - HZB do not have this (yet) - but see it as analysis and therefore not part of the labbook/logbook space - ESRF want forms that run set procedures based on ICAT metadata (Sample, type, beamline, technique etc.) - Narrows down to e.g. one parameter - Handled by a separate API/black box that maps these things - STFC uses DAaaS platform for users to run analysis in virtual environment ## ESRF Logbook (Mongo) - Contains "everything important to write down" - Want to centralise with the rest of ICAT - but not extend or abuse the ICAT schema - Uses ICAT for authn/authz - Use TinyMCE for the formatted text entry - Supports different formatting options, ESRF allow a subset - Needs to have copy/paste keyboard shortcuts - Editable during embargo, READ only afterwards for everyone - Tags: - Tag (types) can be created at different levels - Human logs are "annotations", machine written are "notifications" - Notifications are generic, or beamline specific - Convincing beamline scientists to integrate with the logbook is desirable - 6 years: 14 millions documents, 8GB in total of information - Most of these are "notifications", and not "annotations" - Creating information not tied to a session - Beamlines are not always running, but scientists still want to log things - Rolf: create a "fake" session/investigation and make logs against that - Convert to PDF functionality - Will be different depending on your viewing settings - Takes a few seconds to generate - Searchability: - Should show log, then when clicking it brings you to that exact log in context of the logs around it - Dataset integration - it would be nice to be able to click on paths to the dataset and link to it, but NYI (will it be?) - Estimate ~20% uptake - Some users rely on it (especially remote users) - Q (Allan): How difficult is to allow an external software to register logs in the e-lookbook? I am thinking in sofware used for control beamline as mxcube, specs (proprietary sofware), etc. - Just a call to the API, should be easy to do - Uses a "magic" sessionId (different authentication via ICAT for users) for machines, the API key is fixed and compared to config options on the backend ## "Standalone" Logbook NB: the "standalone" version is actually just the full Datahub with everything but the logbook turned - Changes relative to ESRF's ~2 year old starting point - Facility specific branding/styling - PDF formatting - Introduction of baseurls for frontend/backend components - SSO plugin for "token" instead of "password" - Would be nice is all this config was done at runtime, taking values from single folder - API refresh - available in the Datahub due to library updates - Needed to be added to standalone version - Permissioning - Needed to change queuries (hardcoded in production) - Config file specifies the query - Null is taken as forbidden, anything else is good - isAccessAllowed endpoint - if this was in REST this could be used (checking exact CRUD) - Q: format of the query? - Not fully configurable in production yet - JPQL - Changes Dockerfile images, so it was available outside ESRF - Nice to haves: - Machine readable - RO-crate - Machien accounts (see above): - Q: what does this mean? - Should be software account which writes to logbook automatically - Concerns over elevated permissions - Concerns over mapping of Logbook to ICAT entities ### General points - Too much was hardcoded "the ESRF way", would be better to make it configurable - Difficult to merge upstream, cannot test side effects - Fork of datahub from 2 years ago, want to merge with more recent Datahub code changes - ESRF working on new version of DataHub (still using ICAT+) - Moving towards micro frontend architecture - look and feel is the same, but code is in a different repo - Should be able to plugin to different frontends - i.e. component is a single page - Still need to deploy ICAT+ on the backend - but this should not be seen as a dealbreaker - e.g. ESRF/HZB run both ICAT+ and DG-API to get the PANOSC common search api functionality ## ESRF Notebook (Etherpad) - Users resistant to having multiple tools for "same" job - Dedicated "overview" cell which is actually a passthrough to the Etherpad panel - Have had issues with Etherpad technology - Data corruption was an issue - Cookies are a pain, but can be resolved - Users do "something" and this breaks the pad - not identified - either images or tables? - Tried it with MongoDB backend, but this was too big - moved to Postgres - Etherpad has other functionality, but main draw is real time collaboration - Easier to curate the high value information - Quick win at the time ## HZB - Only a few experiments connected to the Logbook, but for them it is important - Issues with session timeouts - implies heavy usage - Users basing work around it ## STFC - working on integrating etherpad into DataGateway - developed an authentication plugin - worked once but still not finished - etherpad has started being maintained again - no issues encountered yet but not tested in aanger yet ## Actions for the future - Need to merge HZB developments with Datahub version 1 - Considered but decided against "waiting" version 2 - Want to encounter bugs and issues sooner rather than later - No clear timescale on DataHub version 2 - Any backend integration "generalisations" will be useful in both version 1 and 2 (i.e. work will not be wasted) - Specific features: - Session refresh: solved by Datahub upstream changes in a different way to HZB work - Q: are the big changes to DataHub (since the 2 year fork)? - No, the version 2 data portal is on a different repo - underlying technology is different (mostly), scope is larger as it includes processed data - Tried forking a few months ago, and merge weekly, but there are too many changes to keep up with - concerned other features may have been broken alongside the merging - version 2 is https://gitlab.esrf.fr/icat/data-portal - Motivation for version 2 is to also get ispyb - Should be "in production" for one beamline by next week - This will not include the logbook as it's not a "core" feature - Architecture is done but not fully tested - Should be possible to start experimenting with it - Shouldn't need to version match React (but might cause issues) - ESRF use React18, bootstrap - DataGateway use which version of React? (Think 17, open issue for going to 18) and MUI5 - Create issues on ESRF repo for HZB's requests, merge them in the future - Have meetings a few times a year to go over big features? ### HZB feature requests - Some scientists worry about dependency on central services - Want to continue working even if the network goes down - Can we keep "offline" revisions, then send them from the client when needed - Local browser storage keeps the WIP edit - ICAT+ API sends the notifications from the beamlines, but want to use RabbitMQ or similar to broker the machine messages - Consumed by ICAT+ (probably, a dedicated instance maybe to reduce load) - User logs could also go via an MQ - Local copy of logbook? Should all be saved in the browser, with confirmations - these are flushed when coming back online - get confirmed - Logbook entries are not "merged" like Git - New entries do not "overwrite" previous entries, they just appear as the most recent revision and are shown in the UI - However there is no notification to the user that someone else may have edited in the meantime - Is this feasible? - Once the investigation is done, want to store the logbook in machine readable format, single file, alongside all the data - not dependent on the running service - PDF exists, but this is for humans - HTML and plaintext are also possible currently - Want something like JSON/XML? This should be possible as MongoDB is already JSON - Question is when to make this dump - should be a script running before embargo ends - Might want to "clean" the final version by removing intermediate edits - Should be possible based on how you take the dump - Exchange formats - RO-crate comes up the most - Make logbooks available as datasets? If it is machine readable, it can then be imported by other software