# MxCube with ICAT # Goal sdfsdfs sdfsdfs sdfs fsd The purpose of this document is to outline the current development requirements for utilizing ICAT in MxCube. The final ideal scenario is to integrate ICAT so that it can: 1) Submit the results of the data acquisition to both ISPyB and ICAT, including all metadata, in order to make use of the full set of features of the user interfaces. 2) Retrieve prior knowledge of the protein and crystal from the new sample tracking system that is connected to ICAT Furthermore, the new sample tracking system allows users to define processing strategies. A mechanism needs to be established to send these strategies as input parameters for downstream processes. # Background Currently, MxCube utilizes SOAP ISPyB web services. As a result of the collaboration, these web services are now deprecated and require replacement. The decision at the ESRF was to replace them with ICAT. While other MxCube collaborators might consider choosing alternative software, as of now, no implementation is available At the ESRF, it was also decided that ISPyB and ICAT will continue to work in parallel for a certain period (yet to be determined), potentially spanning months or years. :::warning Question: Do we want to implement the possibility to switch the sample tracking from ICAT to ISPyB ::: Next diagram shows how information is currently transferred from one system to other: ```graphviz digraph hierarchy { labelloc="b" label="Architecture based on the ISPyB Sample Tracking System" nodesep=1.0 // increases the separation between nodes ISPyB_Database [color=blue,shape=cylinder] UserOffice [color=black,shape=box] User [color=black,shape=box] ISPyB [color=blue,shape=box] MxCube [color=Red,shape=box] ProcessingPipeline [color=black,shape=box] User->ISPyB [label="populates"] UserOffice->ISPyB [label="populates"] ISPyB->{ISPyB_Database} ISPyB->MxCube [label="imports sample inf."] MxCube->ISPyB [label="pushes dc"] MxCube->ProcessingPipeline [label="triggers" style=dashed] ProcessingPipeline->ISPyB [label="pushes pipelines"] {rank=same;ISPyB MxCube } } ``` Note that ICAT does not have its own sample tracking system. The "new" sample tracking system is a new software developed at the ESRF with its own database. It utilizes the ICAT software to retrieve information about investigations, samples, and users. Since it is independent software, decoupled from the experimental data, the proposed new architecture is: ```graphviz digraph hierarchy { labelloc="b" label="Architecture based on the ICAT Sample Tracking System" nodesep=1.0 // increases the separation between nodes TrackingDatabase [color=blue,shape=cylinder] UserOffice [color=black,shape=box] User [color=black,shape=box] TrackingSystem [color=blue,shape=box] MxCube [color=Red,shape=box] Ingester [color=Red,shape=box] ICAT [color=Red,shape=cylinder] ProcessingPipeline [color=black,shape=box] User->TrackingSystem [label="populates"] UserOffice->TrackingSystem [label="populates"] TrackingSystem->{TrackingDatabase} TrackingSystem->MxCube MxCube->Ingester [label="pushes"] MxCube->ProcessingPipeline [label="triggers" style=dashed] ProcessingPipeline->Ingester [label="6) push"] {rank=same;TrackingSystem MxCube } {rank=same;TrackingDatabase;ProcessingPipeline } Ingester->ICAT [] } ``` During the MxCube and ISPyB meetings, the development of an 'abstract LIMS' was mentioned, which might eventually allow connection to multiple LIMS in a generic way. This solution is very attractive from a technical perspective as it abstracts local LIMS implementations. However, given the circumstances and the timescale for implementation at the ESRF, it is not realistic for us to agree on an API for systems that have not yet implemented sample tracking (e.g., py-ispyb and scicat). # Actions ## 1.- Uploading results to ICAT Most of the missing metadata pertains to the sample description and location, sourced from the ISPyB sample tracking system. Example: - Protein name - Dewar name - Puck position - Etc.. It is up to us to decide if we want to implement this first, so the results pushed to ICAT are complete. The advantage is that this change is relatively quick as we need just to send few metadata parameters. The disavantage is that it will need to be overriden when the new ICAT sample tracking is implemented. If the implementation of the new ICAT sample tracking is not going to be completed soon (e.g., before winter start-up), then it would definitely make sense to push the missing metadata to ICAT. This way, the UI can be fully tested, and users can benefit from the full set of features. If there is a commitment to implement the sample tracking system so that MxCube is ported to ICAT before the User's meeting in early February 2024, then it would make sense to skip this step In general, we need to enrich the metadata of the experiments. The ISPyB collaboration has recently been oriented towards scientific improvements; therefore, more frequent changes are expected in the coming months, including the adoption of ontologies, vocabularies, etc.. :::warning There are some metadata pamateres that need improvement like the start and end time of each action (for data acquisition) and processing. Currently, it seems to me that startDate and endDate are the same. [Example](https://data2.esrf.fr/metadata?sampleId=1403966433&filterParam=Date) ::: Nevertheless, for offering ICAT to users there are just a few crucial metadata parameters that are missing. ### Sample information By adding the next metadata, the summary should work on the new UI | Name | ISPyB Mapping | Description | | -------- | -------- | -------- | | SampleProtein_acronym | Protein.acronym | Acronym should match with the acronym given in the sample sheet of the A-form | | SampleProtein_name |Protein.name | | | SampleChanger_position | Container.sampleChangerLocation | | | SampleTrackingParcel_id | Dewar.code | Identifier of the parcel | | SampleTrackingContainer_type | Container.type | puck, unipuck, spinepuck,... | | SampleTrackingContainer_capaticy | Container.capacity | Capacity of the puck (10, 16, 64,...) | | SampleTrackingContainer_position | BLSample.location | | | SampleTrackingContainer_id | Container.name | identifier of the container | ### Characterisation These parameters are today stored on the Screening table of ISPyB Parameters that define the **strategy**: | Name | ISPyB Mapping | Description | | -------- | -------- | -------- | | MXStrategy_strategy_success |Screening.strategy_success || | MXStrategy_indexing_success |Screening.indexing_success || | MXStrategy_ranking_resolution |Screening.ranking_resolution || | MXStrategy_mosaicity |Screening.mosaicity || | MXStrategy_total_exposure_time |Screening.total_exposure_time || | MXStrategy_multiplicity |Screening.multiplicity || | MXStrategy_completeness |Screening.completeness || Parameters that define the **indexing**: | Name | ISPyB Mapping | Description | | -------- | -------- | -------- | | MXStrategy_space_group | || | MXStrategy_cell_a | || | MXStrategy_cell_b | || | MXStrategy_cell_c | || | MXStrategy_cell_a | || | MXStrategy_cell_a | || | MXStrategy_cell_alpha | || | MXStrategy_cell_beta | || | MXStrategy_cell_gamma | || Parameters that define the **sweep/multi-sweep**: | Name | ISPyB Mapping | Description | | -------- | -------- | -------- | | MXStrategy_sweep_number_images | |Array| | MXStrategy_sweep_rotation_interval | |Array| | MXStrategy_sweep_axis_start | |Array| | MXStrategy_sweep_axis_end | |Array| | MXStrategy_sweep_rotation_range | |Array| | MXStrategy_sweep_rotation_axis | |Array| | MXStrategy_sweep_transmission | || | MXStrategy_sweep_oscillation_range | || | MXStrategy_sweep_exposure_time | || | MXStrategy_sweep_multiplicity | || | MXStrategy_sweep_completeness | || | MXStrategy_sweep_wavelength | || | MXStrategy_sweep_edge_resolution | || ### Energy Scan and Fluorescence spectra These two types of scan are not currently implemented on MxCube. It means that the data acquisition and processing are not pushed to ICAT. #### Parameters These are relatively simple scans for which most of the metadata is already defined in ICAT. The remaining parameters can be defined using an agreed-upon vocabulary among scientists in the domain ### Others Many other parameters, such as instrument configuration, detector, slits, monochromator, etc., need to be stored. This metadata is crucial for ensuring that data is FAIR (Findable, Accessible, Interoperable, and Reusable) ## 2.- New Sample tracking system There are three areas where the modifications should be applied: 1. Authentication: from proposal to user accounts by using Keycloak 2. Backend: replacing ISPyB calls to ICAT calls 3. UI: do the needed adjustments in the user interface ### Authentication The use of individual accounts is a strong requirement in the context of cybersecurity. Authentication of individual accounts at the ESRF is done via a common source identity and access management solution called Keycloak which also provides Single-sign-on (SSO). ICAT allows authentication against the Keycloak token and returns an ICAT token, enabling clients to access the endpoints (API). The mechanism is very similar to the currently [used](https://github.com/mxcube/mxcubecore/blob/3a2120102f8377adafdac3d0d7c4658063d51520/mxcubecore/HardwareObjects/ISPyBRestClient.py#L76) one, except instead of passing the username and password, the Keycloak token is used. Next diagram shows an example on how authenticate and retrieve information from ICAT. ```sequence User-->MxCubeWeb: opens MxCubeWeb-->Keycloak: opens Keycloak->LDAP: authenticate(username, password) LDAP->Keycloak: authentication response Keycloak->MxCubeWeb: Keycloack token (Ktoken) MxCubeWeb->ICAT+: authenticate(Ktoken) ICAT+->MxCubeWeb: ICAT token (Itoken) MxCubeWeb->ICAT+: getSamples(Itoken, proposal) ``` One advantage concerning security is that the username and password do not circulate between our application, which might be susceptible to compromise, but between Keycloak and LDAP. ### Backend The information about the sample, diffraction plan and shipping is currently retrieved from ISPyB. As I can see in the MxCube repo, the interface lives on [lims.py@MxCubeWeb](https://github.com/mxcube/mxcubeweb/blob/develop/mxcubeweb/core/components/lims.py) The "backend" part can be found on [ISPyBClient.py@MxCubeCore](https://github.com/mxcube/mxcubecore/blob/develop/mxcubecore/HardwareObjects/ISPyBClient.py) A quite pragmatic approach could be to overwrite certain methods to use ICAT instead of ISPyB like people do [here](https://github.com/mxcube/mxcubecore/blob/develop/mxcubecore/HardwareObjects/ALBA/ALBAISPyBClient.py). This approach would allow easy switching from ISPyB to ICAT, and other facilities using ISPyB will not be impacted. The main drawback is that we are keeping the same objects as in ISPyB, which are tightly coupled to the database schema. If this approach is envisioned because of the advantages in the short term, for the long run, the code should be simplified. ### User interface The MxCubeWeb UI might need adjustment to use the new sample tracking system. There are two major **differences** in the new sample tracking compared to the one used in ISPyB. 1. There is a single shipment for each session, which can contain multiple parcels 2. No processing status of the dewars The goals of both changes are simplification on the sotfware and making it easy for users. The next flow diagram shows the different actions that need to be taken to choose the dewar you want to collect ```flow st=>start: Start e=>end retrieve_proposals=>operation: Retrieve Proposals retrieve_sessions=>operation: Retrieve Sessions is_session_scheduled_now=>condition: Scheculed now? session_selected=>inputoutput: Session selected auto_select_session=>operation: Auto select session session_list=>parallel: Show session list select_session=>parallel: Select session change_session=>operation: Change session retrieve_parcels=>operation: Retrieve parcels select_parcel=>inputoutput: Select parcel is_session_confirmed=>condition: Confirm session? st->retrieve_proposals->is_session_scheduled_now is_session_scheduled_now(yes) now)->auto_select_session->session_selected->is_session_confirmed->e is_session_scheduled_now(no@No session found)->session_list->session_selected is_session_confirmed(yes)->retrieve_parcels->select_parcel->e is_session_confirmed(no@change)->session_list session_list(path1, bottom)->session_selected change_session(path1, bottom)->session_list ``` ## 3. Implementation There are few calls to ISPyB that would need replacement: | Service | Method | description | | -------- | -------- | -------- | | ToolsForBLSampleWebService | storeOrUpdateBLSample | | | ToolsForBLSampleWebService | findSampleInfoLightForProposal | | | ToolsForBLSampleWebService | findBLSample | | | ToolsForShippingWebService | findPersonByProposal | | | ToolsForShippingWebService | findProposal | | | ToolsForShippingWebService | findLaboratoryByCodeAndNumber | | | ToolsForShippingWebService | findLaboratoryByProposal | | | ToolsForShippingWebService | findPersonByLogin | |