[toc] QA format :::info Q: A: ::: https://csuprod-my.sharepoint.com/:p:/r/personal/bbrouf01_student_csu_edu_au1/_layouts/15/Doc.aspx?sourcedoc=%7BD35BD5A4-6AB2-4108-ADF2-617D80C3D6BB%7D&file=Using%20Microsoft%20JARVIS%20to%20generate%20insights.pptx&action=edit&mobileredirect=true ## Slide 1 - 2 Cover & Table Of Contents (Thanks for coming blah blah blah) ## Slide 3 - 5 Introduction ### Slide 4 - Team Member Introcution Each perosn should introduce themselves and their role with in the project as listed on the slide. I.e. Cody Team lead and contibuting member... ### Slide 5 - What Is MS JARVIS Moving on, I'd like to start by orienting you with the subject of our team's project, Generating Insights with Microsoft Jarvis. Before I expand on that a little more, I'd just like to run some background past those of you who aren't familiar, so Jarvis itself is an attempt at general artificial intellegence by Microsoft. Which is an approach that uses an LLM controller to organize several other task dedicated models to act as one giant capable model. Now, what we hope to achieve with this technology is to have a an AGI that can analyze documentation, among other things, and generate 'intellegent' insights. This global analysis of a source has a large potential for improving any large database of information that a normal individual would sturggle to consume on their own. :::info Q: How does Jarvis connect the models using an LLM exactly? A: Although we cover it in detail later, I can summarize that it uses several stepped phases, including planning, scheduling and executing which involves the use of several AI model tools. ::: ## Slide 6 - 7 Methodology Before we dive too deeply into how all this works, and how we're doing it, I'd like to briefly touch on our approach, in terms of development philosophy, generally how we're planning and executing things. The nature of this project made us decide upon an agile-esc approach pretty quickly. Based on the size and skillset of our team, it seemed obvious that picking a more rapid yet flexible approach would be ideal. The circustances of our stakeholder alo influenced this decision, since this particular methodology has a high priority to satisfy the customer through early and continuous delivery. However, what I specificaly mean when I say 'picked' AGILE is we essentially adopted the most pertinent features of that methodology and adapted them into something we were more comfortable with. A particular note about our method was the process of breaking down tasks, and assigning them to members of the group. For the most part, excluding a few technical facets assigned to those especially capable, most of the distribution of tasks was theory crafted. Otherwise, executing SCRUM pipelines for this approach was fairly unproblematic. The distribution of tasks was typically difused enough to allow each member to complete tasks with little exception for time extensions between and during sprints. :::warning (This is bullshit, disregard. I just need extra script) ::: An important lesson we learnt from this lack of research and planning into task distribution was the discomfort and subsequent innefficient experienced by our members. In the future, we'd like to put more effort into researching the nature of tasks to build to strongest distribution amongst members. :::warning TODO: - CC Consider talking about that expert methodology bullshit if other's agree it's worth it to appease the Dean:tm: ::: :::info Q: What other methodologies did you consider, and why didn't you choose them? A: Other approaches, like waterfall, didn't seem as flexible as Agile, which was a major point we'd need on a project like this that was constantly shifting in terms of solution and perspective. ::: ## Slide 8 - 11 Technology And Communucation :::info Can probably get away with just regurgitating the text on the slides and fielding questions at the end of each slide to pad for time ::: ### Slide 9 - CORE AI Stack JARVIS: Python: OpenAI: Hugging Face: ### Slide 10 - Infrastructure, Interface and Automation GRADIO: AZURE: GitHub: GitHub Actions: ### Slide 11 - Collaboration Stack OneDrive: Discord: Visual Studio Code: ## Slide 12 - 17 System Architecture ### Slide 13 - Data flow diagram :::info Information here about this diagram, why and how it works and what is it showing ::: (Slide 12) Regarding the system's overall architecture (Slide 13) The general overview of how generating insights is proposed to work, starts with the documentation. From here, as mentioned previously, this is fed this into the system through a Gradio interface. Gradio then feeds this into Jarvis, which then performs it's planning and step generation process, to feed the document into several AI models, which concludes into a final output. ### Slide 14 - Data Handling Process :::info The basic rules we use when handling informaiton given to us by external sources along with our internal ideas. ::: Regarding the actual handling of input data, the documentation and data is anonymized. Meaning that no personal data will be input through the system. Using the API keys and login information provided by the stakeholder, the project secretes are handled confidentially. To further unsure confidentially, all connections are via a secure environment. ### Slide 15 - 17 Architecture Diagrams Here is the architecture diagram showing the pathway and processes. Firstly, we have the user input in the format as a text file and JSON or other. This is uploaded to JARVIS via Gradio. Using a combination of ChatGPT and Hugging Face, JARVIS will perform four stages: 1. task planning; 2. model selection; 3. task execution; 4. response generation. Using ChatGPT, the user's request is analysed to understand intention and break it down to solvable tasks. To solve the tasks, ChatGPT selects models hosted on Hugging Face. Task execution, invokes and executes each selected model and return the results to ChatGPT. ChatGPT then integrate the prediction model and generate responses. This output is sent to Gradio. ## Slide 18 - 20 Looking Forward (CI/CD & Scalability) (Brandon) ### Slide 19 - CI/CD We implemented Continuous Integration into our development cycle to automate test execution whenever code is pushed to any branch of the repository. This was done using Pytest, and on the left-hand side of the slide, you can see our current mock test setup. At the moment, both tests assert a true value, which ensures a passing result and helps validate our CI pipeline is functional. In the future, we plan to expand these tests to validate actual model output and verify live API connections. This will allow us to catch issues before code reaches production - which becomes especially important once our Continuous Deployment pipeline is in place. While we haven’t implemented CD yet, we’ve scoped it out and it's ready for the next phase of development. As covered earlier, we plan to use Microsoft Azure as our hosting solution, and this will eventually include containerization and environment automation. For this development cycle, our priority has been ensuring that our CI tests are functional, reliable, and aligned with our Agile process. Once that’s locked down, we’ll move to activating our CD workflows. The importance of both CI and CD pipelines ties directly into our choice of an Agile workflow. Agile thrives on continuous feedback - and our CI setup delivers that immediately to developers. If a test fails, the loop begins again: a new iteration, a new push, and a new test cycle. You can see that feedback loop visualized on the right-hand side of the slide.​ Once CD is in place, stakeholders will also benefit - they'll be able to directly interact with updated model deployments and provide real-time feedback, helping us tailor the system to better meet their specific business needs. ### Slide 20 - Scalability Shifting to scalability - at this stage, we’re still in the early development cycle, but we’ve started to consider how our system might need to grow over time. Our current setup is quite simple, but we’re aware that as the project progresses, things like API load, model response times, and dataset sizes could become more complex. While we haven't implemented advanced scaling techniques, we are keeping them in mind. **JARVIS (Core Framework):** JARVIS is currently being used in its default configuration. As we continue development, we’ll need to monitor whether it can handle more complex datasets or longer interactions efficiently. If we see performance bottlenecks, we may need to restructure how tasks and models are invoked to ensure the system stays responsive. **Python (Backend Logic):** All our backend logic, including API handlers and model routing, is written in Python. While Python is flexible and easy to work with, performance can become a concern under heavier loads. If we notice slowdowns during testing, we might consider optimizing critical paths or even exploring asynchronous processing with tools like FastAPI or asyncio. **Azure (Hosting Environment):** We're planning to host our system on Microsoft Azure. While we haven't deployed it there yet, we're aware Azure provides tools for autoscaling, containerization, and distributed computing. These will be useful later if we need to handle multiple users or larger volumes of data - but for now, we’re just keeping these options on our radar as part of future scaling. ## Slide 21 - 25 - DEMO (Blake) I'm Blake and I'll be leading the section about active development and learnings for the team throughout the semester. ### Slide 22 API & Hugging Face Let's start with API design and testing. JARVIS relies on two API's, the OpenAI (or Azure AI) API for accessing ChatGPT, the core orchestration model, and HuggingFace API for accessing specialised models. JARVIS'primary functions of model orchestration won't work with simple Completion APIs, and instead relies on Chat Completion APIs. This enable multi-turn dialogue, and the orchestration of multiple models as part of its workflow. The HuggingFace API is used for accessing specalised models to perform specific tasks. JARVIS supports multiple inference modes for running these AI models, such as lite mode for running models on available HuggingFace Inference Endpoints. ### Slide 23 - Testing (API & Hugging Face) Standalone API testing can be accomplished using Pytest, to test and validate input and output from API calls. Testing ensures API calls are functional, and within expected paramaters. A functional pytest would emulate the basic workflow of JARVIS. That is, to pass input to the Azure AI API and validate the output, then passing the output into a specialised AI model using the HuggingGPT API, and validating that output, adn returning it the the Azure AI model for final processing.. Input and Output validation can be done, by using pre-defined inputs with known outputs, such as mathematical problems, basic rudimentary questions, such as "how many 'g's in hugginggpt?"" or transcription of audio or video files. ### Slide 24 - 27 DEMO Photos (Blake B) #### Slide 24 | JARVIS Implementation Our initial hurdles began with identifying prequisites and tools required by JARVIS which where mentioned but not explicitly stated in their documentation. When installing the JARVIS requirements inside the isolated conda environment, we encountered issues with missing dependencies or packages, or unsupported versions. We encountered errors with espnet packages for voice recognition, while downgrading other packages such as werkzeug, flask and huggingface_hub due to functions JARVIS relies on, that are removed in newer package versions. As in the slide, we've raised these issues in the Github repository to track the depreciations, and solutions or workarounds. We have also logged them in the Github wiki under our Experiment log. #### Slide 25 | Signs of early success . . . After the implementation hurdles, we reached early signs of progress. Being able to successfully run JARVIS via CLI, we now began encountering issues with depricated models and API calling issues. By solving 1 bug we uncovered even more, such is the spirit of programming. When attempting to use other models, or utilise Azure AI models we began encountering issues with depreciated models such as text-davinci-003 and API testing. After validating a set of API keys for Azure thanks to Deans assistance, and personal OpenAI keys we are now able to make successful calls using JARVIS via CLI. This did involves having to update files such as the YAML config files, and the get_tokens_ids python file to add support for newer models. #### Slide 26 | Current State of Development As mentioend, thanks to Dean's assistance during a Dev session, we where able to successfully resolve issues with utilising Azure API keys and make successfull API calls. During the session we where able to identify and code in supported models, and resolve issues with Chat Completion support for newer models. By this point, we encountered an issue with the orchestration model being unable to identify available HuggingFace models for tasks. The team was also able to get an early working GRADIO interface. Unfortunately results in an "Error" message with no information or cli output. #### Slide 27 | Next Steps for Development This brings us to our current state, leaving us with 3 main goals to achieve, The first is to resolve depreciation and dependency issues prevalent in JARVIS as encountered. Issue 56 has been raised regarding this. Second would be to investigate issues around pecialsed models, and how we can utilise Azure AI models instead. Lastly is to achieve a seamless deployment of JARVIs to act as a functional base. The project wiki current has a Deploying JARVIS Instance Guide, with a Singel Deployment Script that has been rigerously tested to confirm repeatability. ---- ---- :::danger Go no further, this is deprecated content ::: --- --- ## Slide System Architecture & Diagrams The Jarvis' architecture is based on a multistep planning process. The system will receive a task, then attempt to break it down into several steps, accomplishable by AI models available to it. This task planning is achieved by an LLM controller, such as one provided by OpenAI, which will attempt to schedule models to complete the tasks. :::info Q: Does this work? A: Probably :) ::: How we leverage this process to generate insights is by having Jarvis break down the targeted documentation into several analyzable stacks of information. These stacks can be text, images, and audio. Once the documentation has been broken down into its stacks, and respectively linked, the ## Technology Stack ### Jarvis Technology Stack​ The technology stack for Jarvis is pretty wide, but we're going to try and focus on the most pertinent components. Of course, at the core, you have microsoft jarvis itself, which is supported by python. Python is what the majority of Jarvis is written in, 90.2%. Python lends itself well to our needs, it's a pretty high level language, which makes it easy for rapid development and prototyping. More specifically, Python is great for data science and other analytics used in AI. ![image](https://hackmd.io/_uploads/rkuL9SbMll.png) The LLM models we're using for this project comes, for the most part, from the OpenAI line, using the OpenAI API. This API is what generally gives us access to our various models. However, most of the non LLM models are sourced from hugging face. This is where we get pre-trained models to handle tasks such as image recognition and anything else OpenAI doesn't cover. ### Jarvis Technology Stack​ Cont In terms of hosting and user interface, we're using Azure and Gradio respectively. Gradio is our entire frontend stack. It's pretty lightweight in terms of workload, so we were able to spend more time with the AGI. The visual quality is fine, it's obviously made by people with better taste than me. Azure is our hosting solution. All the components of the Jarvis AGI requires a decent amount of hosting, in terms of server hardware. This service generally gives us enough resources to operate our current tests. ### Development Technology Stack The development stack for this project is pretty modest. Our most stand out tools are GitHub and VSCode. GitHub is our version control, code collaboration, task tracking, wiki, and integration manager. It does pretty much everything. GitHub's continuos integration / development tools is especially helpful for this project. ### Collaboration Technology Stack Our dedicated collaboration stack is also pretty modest. We find that, at a minimum, we only need communication and file sharing services, to effectively collaborate at our current team size. Discord is our choice of communication. For what it is, it's a pretty decent free service. It's also common enough that many of us were already familiar with the interface. For those of us who weren't, learning was pretty easy. for sharing files, we decided to use One Drive. One Drive is far more centralized than the other solutions, which was imperative for the team composition to have a single unified space. The built in microsoft support made it very convenient to use the web version of existing microsoft suite tools. <details> <summary> Slide Scalability & Performance Considerations </summary> *The scalability and overall performance if this project leans heavily on several core components from the technology stack, covered earlier. Broadyly, these components control the volume and speed of data the system can handle, which directly correlates to how far it can be scaled, and how it will perform.* *Firsty, Microsoft Jarvis itself is likely to be the biggest concern, regarding these directives. Becuase all data goes through Jarvis, it will directly bottleneck the process. This will likely be determined by how Jarvis is built, specifically what language it's written in, and what hardware it's running on.* *The majority of Javris is written in python, over 90% as stated earlier. As an interpreted language, python can certainly be considered on the slow side. The nature of this language means that runtimes will grow much larger and longer as the code and service is expanded. Although certain optimizations can be made in python, underlying process will always be far less efficient than other languages.* *The hardware Jarvis is being hosted on is provided by Azure. The processing power, and speed, of this hardware will directly impact how the system can be scaled and performs. In terms of scale, it should be possible to add more hardware to the available resources. However, process efficieny directly relies on the aspects of this hardware. Things like hardware processing and communciation speeds will be under large scrutiny when it comes to improving the system performance.* *An ideal approach to scaling this system to be a larger service would most likely be increasing hardware processing power. It's almost out of the question to consider rewriting the code using a more efficient langauge, based on the time and monetary investment that would require.* </details> @RaccOff - I have redone the Scaliblity section hence i moved this into a drop down part so its hidden. The new one is above under the heading "Slide 18 - 20 Looking Forward (CI/CD & Scalability) (Brandon)" ## Slide API Design & Testing TODO ## Slide Live Presentation & Team Engagement TODO ## Slide Closing TODO