# Draft Review Document This document is an interim evaluation of the LabKey platform as it is used for the Benchmarking Pilot project. This document is intended to highlight the platform's strengths and considerations for adoption by users, teams, and programs. \[Insert Screenshot of Benchmarking Pilot dashboard\] --- ## Data Flow Overview The Benchmarking Pilot with LabKey intends to facilitate access to data for analysts to enhance and bolster the program's ability to support, do, and share data analytics and informatics projects within the program and the wider community. The diagram below shows the current and target data flow that the pilot aims to leverage. \[Insert diagrams of data flow\] --- ## Capabilities In addition to the list of capabilities and functionalities the LabKey platform is advertised to have per the company's website, the following capabilities and functionalities were particularly useful throughout the course of the Benchmarking pilot thus far. _\[Add a table that breaks down the LabKey platform functionality in terms of the bullets below\]_ * Access: Ease for users to obtain, work with, and share data * Usability: The platform's intuitiveness and ease of use * Speed: Processing-wise, how quickly the platform process data. Community-wise, how quickly the platform can be used to create, update, and share information. * Analytics: Support for different kinds of data-related analysis tasks. * Development: Support for different kinds of programming languages. In the subsequent sections, each **strength** or **consideration** ties back to at least one of these capabilities. For any **strength** or **consideration** that relates to software development, which includes writing scripts (e.g., in R or SAS), creating and querying schemas, and creating objects in LabKey that require more than point-and-click knowledge, a certain level of technical capability is assumed. --- ## Strengths The table below describes strengths that the team identified during the course of the project thus far. Strengths are items that the team found particularly helpful to working with data, analyze the data, developing analysis scripts, and sharing data artifacts. _\[Insert table of strengths with the following format \]_ Strength | Description :-- | :-- Name of strength | &bull; Colored bullet describing this strength and how it affects the associated group<br/>&bull; Here is another bullet describing how it affects a different group | Central | Platform acts as a single, go-to source for analyses and information sharing. | Robust analytics | Platform allows for reusable and reproducible analyses. Risk of introducing slightly different analyses and errors are reduced. | Flexible storage | Data sets of sizes ranging from small (e.g., 20 observations) to big (e.g., 22 million observations) can be created, stored, and accessed | Multiple data type support | Multiple spreadsheet or data frame style files that analysts are already used to are supported by the platform. These different types are unified into a common schema type. Conversely, data stored in the common schema type can be exported into multiple file types (e.g., CSV, Excel) | Support for multiple data connections | Data can be added to schemas in LabKey through multiple methods including point-and-click, drag-and-drop, scripting, other LabKey instances, external schema connections (structured data), and the file watcher (unstructured data). | Remote and local analytics | Users can conduct analyses on the platform or on their local machine via a direct connection to project folders on LabKey | Supports multiple analysis, pipelines | Analysts can create analytics pipelines that are manually or automatically run (e.g., triggered by new data uploaded) | Remote access | Users with read-only access can pull data to do work locally. Users with read-write access can update LabKey with local data. Both of these are done programmatically (e.g., R Studio) instead of using a browser. | Security | Data are securely stored not only on the vendor's server, but also behind LabKey's software | Extensible | LabKey acts as a blank canvas. Developers can build virtually any kind of application within the confines of the platform (e.g., data collection, data analytics, data dashboards). | Shareable | Account holders can view the latest reports on LabKey. Updates are immediate. Alternatively, users can easily download data artifacts (e.g., data visualizations) to share. --- ## Considerations The table below describes considerations that the team identified during the course of the project thus far. Considerations are items that the team has noted down to address in the future, found challenging to work with, or find useful to include in any future training for LabKey users. _\[Insert table of strengths with the following format \]_ Consideration | Description --- | :-- Name of strength | &bull; Colored bullet describing this strength and how it affects the associated group<br/>&bull; Here is another bullet describing how it affects a different group VPN interference | NCI VPN disables a lot of site scripting that IMS set up for the https://labkey-srp-uat.imsweb.com/labkey portal. The LabKey URLs need to be whitelisted if automatically blocked by an organization's VPN. Memory and storage | The Benchmarking Pilot's LabKey instance is shared with other IMS projects and software on the IMS provided server. At first, there was not enough memory and storage allotted for the Benchmarking Pilot. After iterating with IMS, we were able to raise memory and storage to adequate amounts. Additional instances need to consider having enough memory and storage for the task. Excel file size | More of a limitation on the amount of data data can be handled in Excel. Need to ensure that users are not able to download large datasets, as the files generated from them are not easily usable. Account creation | Requesting accounts current happens in two stages, first with program then IMS. Login | Logging into the pilot instance requires going through two login screens, which circumnavigates the built-in LabKey login screen. --- ## Other Notes (Nota Bene) These are notes that the Benchmarking Pilot team collected throughout the AIMs worked on so far. These items did not fall into either the **strengths** or **considerations** categories. Some items are earmarked for inclusion into any potential training or usage instructions for future projects. Note | Description :-- | :-- API keys | For remote to local connections and data analytics, API keys are required. They expire after 90 days of API creation. Technical capability | Depending on the depth and detail involved in LabKey development, program staff may need certain technical knowledge and expertise. Trainings sessions and interactive courses can be provided to elevate the technical capability of program staff. Development Life Cycle | Like many projects, developing and programming data analytics, queries, and dashboards requires time to design and implement. Sometimes, what seems like a simple task (e.g., just add a button to the screen), is actually involved (e.g., data connections need to be updated, new scripts need to be created, hooks need to be in place). This is not meant to discourage innovation, rather to manage expectations for development and implementation. Sharing | End users that want to access the dashboard need to request read-only accounts. Roles and accounts | Only specific roles allow for reading and writing (i.e., developing content). Need to ensure that target connections already have login credentials (e.g., registries). External partners | Likely for AIM 4, it would be good to identify partner registries that would be open to testing the external connection capabilities of LabKey. --- ## Recommendations The LabKey platform is a user-friendly, accessible, and open platform that is welcoming to users and developers of different backgrounds and technical capability. There are a few recommendations that have been identified so far that would facilitate certain LabKey tasks (see Table X). Recommendation | Description :-- | :-- Memory and storage | Would be a good idea to estimate data size prior to requesting space on the server to best allocate those resources Data format | The project should identify the data structure and data type they are working with as well as the depth at which the source data are provided (e.g., raw XML files vs. DB schemas). External connections | Should have a single or set of LabKey instances to test connections against Training | Teams interested in using LabKey would benefit from orientation and training on using different LabKey functions