# GaiaNet Internal ## Resources - [GaiaNet x Boardroom Proposal](https://hackmd.io/yAkuQDqWTx-Xw5YcwfBXqQ?view) - [GaiaNet x RaidGuild: Boardroom API integration](https://hackmd.io/g6NVSPosQOOLM4PVWbdngg?both) - [Statement of Work](https://docs.google.com/document/d/147cdfexRIupp-1ZZwO-wF4rxbK-n1qzSB_KjaiaDZOk/edit) - [GaiaNet on Huggingface](https://huggingface.co/gaianet) - Github repository: https://github.com/raid-guild/gaianet-rag-api-pipeline - Documentation site: https://raid-guild.github.io/gaianet-rag-api-pipeline - Execution plan for generating knowledge base for all dao/protocols: https://github.com/raid-guild/gaianet-rag-api-pipeline/tree/main/boardroom ## Estimated Costs - Client Management (10%): - Project Management: - Engineering x 2: - Spoils (10%): ### Development and Testing **Total**: 3 Weeks (12 working days, 3 days padding) #### GaiaNet Node 3 days to install scripts and create entrypoints #### Boardroom Middleware 5 days development and testing #### Demo App 4 days development and testing ### Refinement and Documentation **Total**: 1 Week #### Refinement 3 days iteration based on demo feedback #### Documentation 2 days --- ## Meetings Delivered to client on September 19 ### October 25, 2024 - Setup is underdeveloped - Broken links in docs - Reduce the steps to development - 10 steps in the README is a turn off - Cut it down - The links in every step is confusing - There are too many keys - Move the Ollama stuff to another README - Want people to use it at hack-a-thons - The only thing they care about is the developer experience - Getting 404 errors on API examples - Make the OpenAPI process simpler - Create a Docker image - Cannot load the Gaia node on hardware without long execution times - Pipeline execution is faster when you only have to load embeddings ### September 24, 2024 **Team Retro** #### What Worked Well? ![image](https://hackmd.io/_uploads/HJragVnlyx.png) #### What Needs Improvement? ![image](https://hackmd.io/_uploads/HkMkbNnx1l.png) #### What Would We Change? ![image](https://hackmd.io/_uploads/B1QZ-Ehlyx.png) #### Other Notes - Funds should be in escrow prior to work being done - Our estimation was pretty sad - Got pushback from Cleric - Should not base costs on potential for future work - Docs got sloppy by saving them for last - Could improve how tasks were handed off to each other - Project board worked really well - Team was the right size - All the pre-research really helped - Weekly meetings worked well - Having a daily standup would have helped - "Best Monk ever in RaidGuild" - Appreciate how you were willing to get up front and in the face as the voice of the devs ### September 17, 2024 - [First payment](https://etherscan.io/tx/0x126b08781cccc5ea8b3897693f9abc04208544dad79477ca8589f3c09a64fc90) was receieved on August 6. - We estimated the project [timeline](https://hackmd.io/yAkuQDqWTx-Xw5YcwfBXqQ#Timeline) would be 4 weeks - We are 6 weeks in right now and had an unofficial first week before the project was started - So right now we have 7 weeks commited and we need this week to wrap things up --- - Client is hands off - Sydney is currently on the conference circuit in Southeast Asia - They do not seem to have an appetite for another demo - We just need to get them the docs and transfer the repo once the final payment transaction is made --- - Santiago has been testing the repo with the Agora API - Identified some improvements that needeed to be made to make it more generalizable - Was breaking things but got everything fixed - Took longer than expected but is working good now - The pipeline can extract from multiple sources - He pushed the changes and Sayonora is testing with Aave to get an exmple for the docs - Need to double-check everything is working - Implementation is done - End to end testing with sample protocols - Need to work on execution plan for other DAOs - Santiago will write a script to define the plan - Have to have the docs completed before we can test the knowledge base - Santiago will have the docs done by the end of today - Santiago is automating the source code docs with Chat GPT and will have them done tomorrow morning - The execution plan will be done by the end of day tomorrow (Wednesday) - We will not implement automated testing for this project - We have compled manaual testing and can recommend automating it as a good next step - We do not believe automated tests were a line item we agreed to deliver - Need to double-check the docs for styling - Sayonora used Vocs and deployed to Vercel - Santiago will fork Sayos repo and add it to the one we are transferring to the client - He will make sure the README is updated so anyone knows how to get it set up - We should be ready to transfer on Thusday - But should wait to transfer it until the second payment is received - We are confident everything on the project board will be moved to Done by tomorrow - We will keep our meeting on the calendar for next week to have a retro ### September 10, 2024 - Sayoara is running end-to-end tests on his computer - Have had some parallellization issues with Ollama - Santiago has been able to get the parameters working locally - We are trying to determine what the recommended parameters are by finding the limits and then dialing them down - Found a few issues that can be improved when testign the Agora API - Sayonara started the docmentation on Docusaurus - Santiago will review the docs and add the source code documentation - We need some sample questions for the client presentation - Are confident we will have all but one issue done this week - Will be ready to demo on Friday - Should schedule an internal review on Thursday (maybe at Round Table) - Can complete the retro during next week's Planning Meeting and request payment ### September 3, 2024 - Santiago had to make some last minute changes yesterday - Found a bug in how we generate manifests and had to change the logic - The pipeline is working! - We need a node capable of processing this info so we can test the pipeline and finish up the docs - Sydney sent the specs for nodes - Can they provide a capable node? - Do we need to set-up a cloud instance? - Is this within the scope of this project? - GPU capabilities are what is most expensive - Installing the Gaianet node is not hard - Santiago is not experienced with cloud infra - But this should only take a half day - We can have the node up by the end of day tomorrow - And finish end to end testing this week - Docs can be done in parallel - We will be ready to ship and demo next Tuesday - We should wait until the cloud infra is in place before scheduling the meeting - Make sure to showcase the user facing features rather than the technical details #### Todo - [x] Create issue to [Enable Logging Module](https://github.com/raid-guild/gaianet-rag-api-pipeline/issues/39) - [x] Create issue to [Deploy a Gaia Node Cloud Instance](https://github.com/raid-guild/gaianet-rag-api-pipeline/issues/38) ### September 3, 2024 (Demo) - [Notes](https://app.fireflies.ai/view/Boardroom-Gaia-net-Integration-Demo::8SwjC3USp9WYKpE5) - [Slides](https://hackmd.io/@santteegt/ByoykY4nC#/) Maria and Carlos are Web 2.0 engineers learning about Web3. They appreciate us quantifying the number of issues and the effort required to implement this. Documentation is important as they want other developers building on top of this. They requested we send them the notes, slides, and our plans for finishing up the project. --- Our project aims to integrate the Boardroom API with GaiaNet, unlocking access to DAO governance data. Since kicking off on August 6, we have made significant progress, completing 24 issues. We integrated Airbyte to streamline data extraction, providing a solution capable of handling diverse authentication methods and pagination strategies. The pipeline is operational, effectively interacting with the Boardroom Governance API to retrieve and process data. We have exported data embeddings into snapshots, paving the way for data analysis and enhanced capabilities. The project has not been without its challenges but we have made a ton of progress in a short amount of time and I would love to hand it over to the real stars of the show to demo what we have completed. ### September 2, 2024 #### Big Blocker - Not good news - Big blocker on Gaia Nod - Nodes are crashing when we run the pipeline - Happening for both Santiago and Sayonora - Pipeline is working - The problem is with the amount of records - Minor issue with filters not working with Boardroom API - We are also getting inconsistent numbers of records returned - Santiago tried using public nodes with a single worker - Pipeline was running for more than and hour and crashed - Results got worse when running locally (even with increasing amount of workers) - The problem seems to be with the Gaia node - We might try to run it with Chat-GPT to see if we have the same problems #### Todo - Close out any issues that are not blocked - Add `blocked` label to issues we cannot make progress on - Gather list of issues on Gaianet side and Boardroom side - Figure out how much time we need to finish this project #### Demo Tomorrow - Discuss the project board - Show them the architecture - Walk through the process - Use a smaller dataset - Show that the pipeline is working - Discuss the issues with scaling - Determine what guidelines the Gaianet team can provide to help us process ### August 27, 2024 - Everything good from client side - They are happy with the progress - On track for next week's demo - Sasquatch will try to get the date confirmed - We can schedule a demo prep on Monday - Okay to close the issues in Done - Output connector did not take too much time to implement but required a lot of testing - Ran into some issues with related to versions of the Gaianet node and Quadrant DB - We exported the embeddings into a snapshot - Need to test the node with the snapshot - Outdated version could not load the snapshot - Currently having issues testing the Gaianet Node in pipeline - Not getting responses from LLM even with the demo - Was taking about 3 minutes to get a reponse previously, not getting any response at all right now - We cannot optimize the pipeline if we cannot test the node - Will be able to optimize parsing once the other issues are closed - Sayo passed the Open API spec off to Santiago to complete - Hope to have the script to map the API endpoints done by the end of today - Once the entire pipeline is working we can focus on the CLI and Docker - Added some DAOs for e2e testing - And will focus on the Agora OP for the other API - Allo protocol is using some custom software that is out of scope for this project - Once Python code is in modules we can implement the unit tests - After that all that is remaining is the documentation ### August 20, 2024 - Client happy with progress - Moved most of `week-1` to Done - Data extraction loop took much longer than expected - Simplifying with Airbyte was a really good solution - Still a few todos in the repo setup issue - Need to plug in the pipeline and do some stress testing - Check the response times and make sure it does not break nodes - Mapping API will create a script that transforms the spec into a manifest, extract data, and automate the process - Writing the base code for the CLI - Exploring output connectors is the current bottleneck. Will take a couple of days. - We learned how to write the custom connectors in Pathway - Need to store the generated embeddings into the vector store - Added the critical endpoints to the spec. Still working on adding the other ones. - Spun up and ran a Gaia node for 24 hours. Was a smooth process. Still need to do some additional testing locally and figure out how to connect to Sayonora's node. - Working on the Docker configuration - Exploring alternative models. Default is 5.3 it is the most available on public nodes. May need a lighter model than Nomic to improve the quality of responses. - Need to identify other protocols for testing. Important to have DAO framework diversity (Compound, Moloch, Aragon) and Discourse content - Need to add an issue for testing the entire pipeline with a different API. Need to make sure it can extract from any endpoint. Should find one with different auth and pagination. Try Gitcoin Grants or the [Agora's OP API](https://vote.optimism.io/api_v1) from Optimism. - Need to add an issue for creating a plan for scaling the pipeline to handle other DAOs based on feedback from the testing phase. - Need to conduct end-to-end testing to see how long it takes to go from data extraction to knowledge. - How long the optimization takes depends on the results of testing. - Once we get rid of the output connector bottleneck we should be able to do more work in parallel. - Sayanora can focus on docs, Docker, infra, and finding other protocols and APIs - We are confident the remaining work will be completed in two weeks - That is one week longer than the original estimate of three weeks if the timeline started on the day the first payment was received - We are doing a great job of documenting the process and this will be a helpful guide for other projects - We should schedule a demo with the client on September 3. ### August 13, 2024 Doing a great job of adding notes and links to resources on the Github issues How are we feeling about the project? - Doing some progress - A bit behind schedule - Luckily we got some extra time waiting on the transaction - Shout out to Sasquatch - Underestimated how long this stuff actually takes - We don't know what we don't know - Have been updating the issues on Github - Should catch up when we get some of these issues out of the way #### Setup Repo - Sayonora setup Poetry - It is not working on Santiagos machine - Maybe related to CPU - Going to create a Docker file to make setup easier with containers #### Data Extraction Loop - Still working on this - Decided the end product should be able to extract from any API - Need to consider different auth and pagination - Decided to switch to the [Airbyte Apache Doris ELT](https://airbyte.com/connectors/apache-doris) data integration destination connecton - There local connectors feature allows you to define a YAML file for the endpoits and handles everything under the hood - These will be the entrypoints for the pipleine - Will replace the Pathway HTTP connector - Pathway is still core for other features related to transforming, loading, and generating embeddings - Need to run some tests after plugging in data extraction loop #### OpenAPI Specification - Sayonora identified the endpoints from Boardroom - The spec we were provided is missing endpoints - Writing this spec is outside of the scope of what we estimated - Were expecting this to function out of the box - Enpoint does not have the protocol ID - Are having to make adjustments in Postman - Should we ask Gaianet to ask Boardroom for a complete spec? - They most likely made it by hand or just exported it with Swagger - We can go through the endpoints and fill in the missing parameters and update the spec - When you export the collection from Postman you can convert it to the OpenAPI spec - This is two days of development work - We need an additional quote for this effort #### Output Connector Uses Qdrant but they do not currently support Pathway. We will have to implement our own. #### Next Steps - Once we have the data extraction loop and Open API spec we can genererate the endpoints for the pipeline and test the flow end-to-end - Spinning Up Gaia Node can be done in parallel and Sayonora should get started on that once the spec is updated - We also need to write tests on the pipeline - Sayonora should be figuring out what other protocols we should add - We should schedule another call when the data extraction loop is complete to break down the issues for `week-3` #### Todo - [ ] Get quote to client for [Update OpenAPI Specification](https://github.com/raid-guild/gaianet-rag-api-pipeline/issues/23) ### August 6, 2024 Will check with Sasquatch on when to expect payment. We are taking a risk getting started but do trust the client. We should protect ourselves and try to adhere to a standard process when possible. We also need to make sure the client is motivated to make the payment. The issues from `week-1` look good. There are two Santiago will merge into one. But the format works well. We prefer the Kanban view and will try to make sure to asign ourselves when moving the issues to In Progress. Currently below the limits and using the public API key. Sayonora is playing catch-up and had some issues related to types and how data is being handled by Pathway. These roadblocks should be cleared. Had some concerns with the spec being incomplete. Once the pipeline is in place we should make faster progress on the issues remaining in week 1. Do we have experience with the OpenAPI Specification? The Swagger docs are not complete API documentation per Swagger or OpenAPI spec. Sayonora has been exploring ways of generating an API from the YAML file. The spec they gave us is not complete. They made it by hand or generated it. But it does not include some of the request parameters we need. Some sections are missing and there are errors that need to be manually fixed. Some of what we thought was missing is actually at the bottom of the page. We need to identify if there is anything else missing and update the YAML file. How might we automate generating this. Sayonora found a validator with Swagger to generate a web page with functions. #### Todo ### July 31, 2024 Hope y'all are ready to kick some ass. It is a short project so I want to make sure you have what you need and get out of the way #### Agenda - Talk about preferences - View the Kanban board - Discuss how often we need to sync - Communication preferences - Break down this week's tasks Don't want to micro-manage. Just want to make sure eveyone knows what they are working on and have everything that needs to be done accounted for. #### Preferences Will be working in Python instead of JavaScript. Would prefer larger issues with list item tasks to more specific issues. We should add a Testing column to the Kanban board. Add labels so we can filter the view by weeks. [Miro Board](https://miro.com/app/board/uXjVKtvedaQ=/?share_link_id=889969883656) - Cadence - Comms - Blockers I am here to try to make your jobs easier. Santiago works for a few hours on the weekends. Everyone should provide updates at the end of their days and work on what needs to be handed off. [Project Board on Github](https://github.com/orgs/raid-guild/projects/17/views/1) ### July 18 - Sydney Lai - Sasquatch - Sayonora - Santiago - Ξ2T Present [RAG Pipeline GaiaNet x Boardroom - Project Scope of Work](https://hackmd.io/bsToF8rXRaC3hEjiqGyFYA) Sydney's Goals: 1. Fianalize Time 2. Estimate Costs She seems to have a good understanding of the technology. #### Discussion Re: Boardroom API, json file into langchain In the future if the Boardroom API requires a proof of them having a key. Not a requirement to build out API verification. But make sure to make sure the capabilities are there to add this in the future. This project is just porting API data from Boardroom. In the future it may require authentication with their own Boardroom API key to enter these nodes. For the first project we just need a static snapshot of data that can be turned off. It does not have to be a continuous flow. Want to prevent having to rebuild this in the future. Is the fetching also using general search from the Internet? It is okay if the architecture is continuously fetching just wondering what happens if you turn it off. Wants it to still be usable with existing data if the data source is is turned off. Building a knowledge base like gardening on YouTube. 7 year old tomato videos are still relevant. We are building an archive. We want to make these nodes as sources of information. Friends with the founders of Boardroom. Want to present the plan and ask them. Her numbers are just estimates. Every organization is different. They will pay for builders and will offset some of the initial costs of hosting but it is up to maintainers to decide if they want to continue. Immediate future is the architecture you have layed out. Just wants to make sure it is not so custom it cannot be improved in the future. Are you doing a QA process to make sure it works? As long as each DAO has thier own node. Orange DAO, FWB. We need to make sure they came from Boardoom. Each DAO has thier own node. The node that you are creating knows specifically about one DAO. No preference on what DAOs to use to prove the concept. They want us to spin up a node for all 350 DAOs. ### July 11 - Sasquatch - Sayonora - Santiago - Ξ2T #### Notes Answers to our questions here: https://hackmd.io/yAkuQDqWTx-Xw5YcwfBXqQ?view#Questions They do not want to provide examples that may influence our decisions. This may be like selecting a node from the [GaiaNet Chat](https://www.gaianet.ai/chat) What primitives are we building for fetching data? How do we want to structure this project? - Static handover - We maintain it GaiaNet may be more likely to give us the bid if we plan to provide ongoing support. They are also working on a rewards program. They are trying to attract builders. They do not want to increase their recurring expenses. They are shopping other bids What are the infrastructure and devops costs for maintenance of a managed software solution? (infrastructure requirements) We may only have to host the data pipeline Need to provide an example of what we plan to build? (arcitectural diagram) ### July 10 - Santiago - Ξ2T #### Notes - Santiago worked with Jarry on Speedball but he was mostly coordinating async with Dekan - He is feeling kinda lost on how to write the specification - Started this [HackMD](https://hackmd.io/g6NVSPosQOOLM4PVWbdngg?view) - Need to write an API that allows you to get data from DAOs to feed into LLM models on Gaia nodes - See [GaiaNet Node: Setup Workflow](https://hackmd.io/g6NVSPosQOOLM4PVWbdngg?view#GaiaNet-Node-Setup-Workflow) - You can: - Select the LLM - Add a knowledge base - Customize the prompts for requesting information and returning data - He was able to get everything running locally - And thinks we need to build a custom knowledge base with data exported from Boardroom - We need to build the middleware API - Should work with whatever model is selected - Would like to make sure Gaia cofirms his specs - Are we missing anything? - Would be good to have someone else to help - Will work on a list of questions for the client #### Repos - Gaia Fork - Middleware - Demo App #### Estimates - Update install scripts for node forked from Gaia - Create Gaia entrypoint with (server, vector DB, and ?): 3 days - Create demo app to provide examples on how it is used to chat with the LLM 3 - 4 days - Documentation: 2 days 14 working days (3 weeks) #### Todo - [ ] Sasquatch and Ξ2T will research the costs of infrastructure requirements - [ ] Sayonora will install local Gaia node - [ ] Santiago will create a list of infrastructure requirements - [ ] Santiago will work on creating an architectural diagram - [x] Ξ2T will see if `@_.sayonara` and `@jarryingnaut` have experience working with LLMs and bash scripting - [x] Santiago will work on filling out the spec in his HackMd file - [x] Santiago will work on filling out the questions for the client in his HackMd file - [x] Ξ2T will check with `@Sasquatch` to see if it is possible to verify what we think they are expecting - [x] Ξ2T will put together a [GaiaNet x Boardroom](https://hackmd.io/yAkuQDqWTx-Xw5YcwfBXqQ?view) document to share with the client