# Revolutionizing TU Delft's Data Management: ManGO with iRODS and SURF # ## 1. Introduction ### 1.1. General information In today’s digital age, the volume of data being generated and processed is growing at an unprecedented rate, giving rise to fresh difficulties concerning storage, curation, and distribution.Whether it’s in scientific research, healthcare, business operations, or countless other domains, the effective management of data has become paramount. While almost all research generates or reuses data, there's a need for a comprehensive data management plan that encompasses strategies to extract valuable insights and support decision-making.Over the years, significant progress has been made in the field of data management such as different labeling standards, metadata forms, and cloud backups,but yet not enough.Finding data among various files/data, the safety of confidential data, and accessibility of data for other users are other problems that remained unsolved or have been done manually which are time-consiuming operations. :::warning :warning: Alerts! It is a common experience for scientists to encounter data loss at least once in their careers, often resulting from the absence of regular data backups and synchronization or the inadvertent overwriting of original files. Such incidents can pose significant challenges when attempting to recover the lost data. The dearth of comprehensive metadata within interdisciplinary research projects frequently results in perplexity, making it challenging for researchers to navigate and interpret diverse datasets, hindering the progress of collaborative efforts. Another common problem of every project is data fragmentation, which can lead to the risk of data loss or crucial information being misplaced due to inadequate organization, resulting in challenges when trying to locate and retrieve vital data when it is required. ::: ### 1.2. iRODS Data management software and protocols would help scientific communities to have a FAIR (Findable, Accessible, Interoperability, and Reusable) discipline in data management and solve all previously mentioned problems. Several data management software and platforms have been developed to address the challenges faced by organizations and research projects.Integrated Rule-Oriented Data System (iRODS) as an open-source tool stands out by providing a holistic and adaptable data management solution that caters to the intricate requirements of diverse research disciplines and industries. The iRODS, compared to other data management software like CKAN, offers advantages such as data virtualization for seamless access to distributed resources, the ability to enforce data policies, scalability, and performance for large-scale data environments, robust metadata management, customization, and integration capabilities. What sets iRODS apart is its rule-based policy system, which empowers users to tailor data management policies to their specific research or industry needs. This high level of customization and adaptability allows iRODS to serve a wide range of use cases, from scientific research to industry-specific data management. iRODS’ exceptional capability lies in its capacity to facilitate effective interdisciplinary collaboration. By offering a shared platform for data management, it bridges the gap between various research fields, enabling the seamless sharing and integration of data from different sources. Furthermore, iRODS’ open-source nature fosters extensibility and customization, ensuring its ability to evolve with the ever-changing data management landscape. This makes iRODS a valuable asset for those seeking a versatile and collaborative solution to their data management challenges. However, using iRODS can be somewhat challenging for users with minimal interest in coding and utilizing command-line interfaces within a plain, black terminal. Figure 1 depicts the iRODS environment.Therefore, a combination of the iRODS with a user-friendly interface such as ManGO and the ability to host big data/large-scale projects with SURF would provide FAIR, quick, and easily manageable solutions. ![Capture-1](https://hackmd.io/_uploads/Sk4uKE7Tp.png) `Figure 1. Ubuntu terminal with connection to iRODS.` ### 1.3. Background TU Delft is part of the geothermal project DAPwell on the TU Delft campus. The goal of the project is to facilitate geothermal research and its application, as well as to use the heat from the earth for the heating of buildings. This geothermal resource is installed near the TU Delft campus that makes this well a part of a large-scale research program. In this project for the Delft University of Technology, we are planning to integrate ManGO empowered by iRODS with SURF hosting service to provide a better data management package for the DAPWell for geothermal energy project. ## 2. ManGO ### 2.1. User-friendly interface ManGO, powered by iRODS, represents a robust and user-friendly active research data management solution. ManGO, as a web-based friendly interface, enhances the iRODS experience by providing an accessible and intuitive interface for users. It simplifies the process of data management, making it more approachable for individuals with varying levels of technical expertise. Ability of having access from everywhere anytime, previewing data and its metadata (e.g., size, modified date, content, format, and owner), classifying different levels of access to data, and searching a specific data among huge amount of data have made ManGO suitable for big projects. The following sub-sections describe the main structure and tabs of ManGO. Figure 2 shows the main menu of ManGO. The following information are from KU Leuven as developer of ManGO provides. ![Capture-2](https://hackmd.io/_uploads/B1i3Uv766.png) `Figure 2. A collection view of the ManGO.` ### 2.1.1. Collections In iRODS/ManGO, the "Collection" tab plays a pivotal role in facilitating data management. It serves as a feature-rich section within the interface, allowing users to handle collections of data efficiently. Collections in this context represent logical groupings of data objects and other collections, organized hierarchically for easy data organization and access, akin to traditional file system folders (Figure 3). Users can leverage this tab to perform a range of actions, including creating, copying, moving, or deleting collections. Each collection is accompanied by vital details such as its name, owner, creation and modification timestamps, and size. Furthermore, users can enhance the collections with metadata, adding valuable context. Access control is also streamlined through the "Permission" tab, enabling users to determine who has access to these collections. This comprehensive set of features empowers users to structure, manage, and interact with their data seamlessly, ensuring efficient data management within the iRODS/ManGO environment. ![Capture-3](https://hackmd.io/_uploads/B1Fbxd7pT.png) `Figure 3. Options of a collection page.` ### 2.1.2. Collection permissions Within the “Permission” tab, collection owners possess the capability to precisely control collection access (Figure 4). They can edit and manage access permissions by adding or removing users who have already been defined. This level of control empowers collection owners to tailor access rights, ensuring that the right individuals can interact with the uploaded data in accordance with their specific needs and requirements. ![Capture-4](https://hackmd.io/_uploads/By3ySPEaa.png) `Figure 4. Grouping and collection permissions on ManGO.` ### 2.1.3. Collection metadata In ManGO, collection metadata serves as a crucial tool for providing context and description to collections, enhancing their organization and comprehension. This metadata typically consists of three main components. The “Name” serves as a concise and descriptive label for the metadata field, indicating the type of information being added. The “Value” component represents the specific details or data associated with the metadata, offering valuable insights into the collection’s content. Additionally, in some cases, a “Unit” may be included to denote the measurement or scale of the associated value, particularly useful for quantitative or measurement-related metadata. Collectively, these components empower users to enrich their collections with structured and meaningful information, streamlining data management and facilitating more effective organization and retrieval. ### 2.1.4. Data properties In ManGO, the “Data Properties” tab plays a pivotal role in providing in-depth insights into the characteristics of individual data items within a collection. By clicking on each specific data item, users can access a wealth of information that goes beyond the surface. This includes details such as the data’s owner, creation and modification dates, size, internal ID, status, and even checksum for backup verification (Figure 5). These properties not only offer a comprehensive overview of the data’s history and attributes but also contribute to enhanced data management, ensuring users have the necessary information to make informed decisions regarding their data assets. ![Capture-5](https://hackmd.io/_uploads/H1fJIwNa6.png) `Figure 5. Data properties of a file on ManGO.` ### 2.1.5. Metadata for data This tab encompasses functions and capabilities similar to those described in section 2.1.3, but it is tailored for data management as opposed to collections. ### 2.1.6. Data permission This tab encompasses functions and capabilities similar to those described in section 2.1.2, but it is tailored for data management as opposed to collections. ### 2.1.7. Data preview The “Data Preview” section in ManGO offers users the capability to view and interact with uploaded data. It functions as a user-friendly interface for data exploration and examination. This section is designed to display data in specific formats, such as .jpg, with the condition that the files are smaller than 200 MB in size (Figure 6). This feature facilitates efficient data management and quick access to essential information, streamlining the data handling process within ManGO. ![Capture-6](https://hackmd.io/_uploads/rka03wETT.png) `Figure 6. Data preview with ManGO.` ### 2.1.8. Metadata inspection and extraction Metadata inspection and extraction in ManGO is a powerful tool for enhancing data management and understanding. This feature allows users to delve deeper into their data, examining the embedded metadata to gain valuable insights (Figure 7). With metadata inspection, users can access essential information about their data assets, including details on ownership, creation and modification timestamps, file size, status, and backup checksums. Additionally, metadata extraction enables the extraction of relevant metadata from files, making it easier to categorize, search, and retrieve data. By providing a comprehensive view of data attributes, ManGO’s metadata inspection and extraction capabilities empower users to make informed decisions, streamline data organization, and ensure that data assets are used to their full potential. ### 2.n. TUD iRODS architecture and infrastructure The iRODS of Delft University of Technology has been successfully installed on an Amazon Web Service/Server (AWS). It is accessible using ‘aws-key’ certificates and a specific IP address from any operating system. This infrastructure allows users to securely manage and access data hosted on the Amazon server. The decision to host iRODS on an Amazon server was made due to the numerous advantages provided by AWS. AWS offers scalability (scale server resources up), reliability (low downtime of the server), security (equipped by firewalls and encryptions), cost-effectiveness, and a vast ecosystem of services and tools (low latency due to worldwide data centre, and machine learning abilities), making it an ideal choice for ensuring the robust performance, accessibility, and data management capabilities required for iRODS at the Delft University of Technology. Additionally, to enhance data security and redundancy, we have opted to maintain data storage separately on SURF. This strategic choice ensures that our critical data remains protected and backed up in a reliable and trusted environment, further reinforcing the integrity of our data management infrastructure. After submitting a proposal to SURF for iRODS Community Edition (SD-58150), we have successfully reached an agreement to leverage SURF’s storage infrastructure for the storage of data uploaded to iRODS. This collaborative effort ensures a seamless and secure data management solution, aligning with our commitment to data integrity and accessibility. Drawing from past experiences and recognizing that not everyone is comfortable with coding or using icommands, our request to utilize ManGO, a user-friendly web-based interface, has been approved by KU Leuven. At present, ManGO is operational and accessible upon request. It is also undergoing ongoing technical updates, including enhancements to security, login configurations, and the development of new policies. We have opted to establish a second iRODS server dedicated to implementing updates, debugging, and conducting tests. This approach ensures that any modifications are thoroughly examined and refined before being applied to the primary iRODS server hosted on AWS, thereby preventing any potential downtime. Figure 12 shows the flowchart of ManGO of the TU Delft.