# JointAI # Project details: **Applicants:** Riccardo Taormina, Artur Schweidtmann **Faculty:** Civil Engineering and Geoscience, Applied Science **DCC members:** Nikolaos Sourlos, Josip Grgurić, João Guimarães, Raúl Ortiz **Additional Team Members:** Sophia Rupprecht, Greg Kyritsakas, Job van der Werf **Support period** February 2024 - September 2024 **Repositories:** [Dataset(s)](https://huggingface.co/draco-ai), [Codebase](https://github.com/rtaormina/DRACO-phase1) **Project Description:** This project was based on two separate applications. Briefly, both applications requested an structured system to organise and interpret complex data. In addition, they needed an integrated workflow to handle such data. Main challenges revolve around data: **- Source:** either public or private **- Type:** text, code, perhaps more complex formats **- Storage:** volumes and structures **- Usability:** structure, versioning, available for algorithmic use and peer-review **Project goals:** The DCC proposed implementing and testing different platforms. Once an appropriate platform and configuration were set, applicants deployed their own instances to populate them with domain-specific datasets, and use them for their own research questions/domains. The aims of the DCC support were: 1) Set up appropriate data infrastructure 2) Establish a data handling workflow 3) Facilitate efficient dataset retrieval and usage ## Project Results 1) to set up appropriate data infrastructure: - Explored institutional and public alternatives - Documented a (set of) solution(s) implemented by the ChemE and CiTG research groups independently 2) to establish a data handling workflow: - Determined data types to be handled (all public, text and code) - Set up and documentated guidelines 3) to facilitate efficient dataset retrieval and usage - Tested, implemented, and documented different options to run model(s) - Undergoing development of criteria to evaluate dataset(s) effect(s) on initial model(s) **Links to output** - Dataset peer-review workflow - [Contributing guidelines](https://huggingface.co/datasets/draco-ai/trial01/blob/main/CONTRIBUTING.md) - [Upload form prototype](https://huggingface.co/spaces/draco-ai/DataManager/tree/main) - [Automated validation](https://github.com/rtaormina/DRACO-phase1/blob/main/notebooks/peer_review/test_data_validation.ipynb) - [Peer-review workflow](https://github.com/rtaormina/DRACO-phase1/blob/main/notebooks/peer_review/peer_review_workflow.ipynb) ![uzjCu0JwUybcsuDRmGMVz](https://hackmd.io/_uploads/By_ML9P-kx.jpg) - Model performance assessment - [Jupyter notebook example](https://github.com/rtaormina/DRACO-phase1/tree/main/notebooks/evaluate_llms) ### Feedback from researchers >"Throughout the project we learned many things concerning our view related to DRACO and what tools to use to implement it; we also learned better how to collaborate with other groups."" >"...the collaboration with DCC and <other PI> unlocks synergies. There is potential for joint future projects and also research collaborations." >"Initially, I was applying with a completely different project. So, my initial expectations were not met. However, we also found value in the new, joint project. I am glad that I could link this project to <student's> research. She really benifitted from the project." >"The project has led to many valuable insights and to preliminary results using existing data." ### Lessons learned - It is essential to have someone working regularly with DCC, making the most of their time and guiding the next steps based on the project requirements/expectations. - With this project, the DCC gained experience on how to onboard, operate, collaborate, and manage an evolving project with more than one reserach group involved.