changed a year ago
Published Linked with GitHub

FrameVR Agents

FrameVR Agents are fully embodied, autonomous, self-simulating and live outside of the frame content. They can connect to frames just like users, and have access to voice and video streams. Agents are fully programmable and customizable, and are decoupled from the FrameVR runtime so that they can run continuously, even when there are no users around, and can connect to other platforms such as Discord and Slack to make the transition from FrameVR to user's standard communication tools as seamless as possible.

Features

  • Agent Console: Manage agents through a console, enabling teleportation, direct communication, and real-time monitoring.
  • Task Management: Agents can be programmed with specific tasks and goals, allowing dynamic management by users.
  • Skill Enhancement: Upgrade agents with new skills through customizable or purchasable code packets.
  • Home World: Agents have their own home frame, serving as a default location and a space where users can visit them.
  • Memory: Agents possess an extensive memory, allowing them to recall information from individual rooms and across all agent memories, utilizing advanced vector search and RAG techniques.
  • Social Media Integration: FrameVR agents can have other accounts. Users can interact with their agent via Discord, Slack, Twitter, Telegram, WhatsApp or SMS.
  • Billing: Agents can operate in different quality settings, with costs transparently passed on to users and estimated hourly rates provided.
  • Autonomy: Agents can be autonomous or not autonomous, i.e., the user can enable them to have an update loop where they go out on their own. Agents can be enabled or disabled at any time.
  • Goals and Tasks: Users can edit the agent's goals and tasks, add or remove goals and tasks, and see updates. This is done in both a basic UI form builder and natural language.
  • Voice Interaction: Agents receive data streams from livekit and can parse into text for response or transcribe. Agents also send out audio stream, which can be paired with a TTS provider like Elevenlabs.

Skills/Capabilities

  • Navigation and Exploration: Autonomously navigate virtual environments with semantic scene awareness and integration with existing navigation systems for commands like "walk to X."
  • Animation and Emotes: Drive existing animation systems to express emotions and reactions, enhancing interaction.
  • Information Access: Integrated browser for accessing web, news, current events, and weather updates.
  • Proactive Interaction: Initiate conversations and actions, contributing actively to interactions.
  • Voice/Video Bidirectional Communication: Both receive and transmit audio and video for two-way interactions.
  • Advanced Navigation: Navigate complex environments with semantic awareness, understanding the spatial context.
  • Expressive Animations and Emotes: Utilize a range of animations and emotes for expressive communication.
  • Integrated Information Access: Provide real-time access to information, including news, events, and weather.
  • Meeting Transcription and Summarization: Transcribe and summarize meetings, capturing key points and discussions.
  • Calendar Control and Reminders: Manage schedules, set reminders, and handle calendar events.
  • Meeting Agenda Management: Facilitate and run meeting agendas, ensuring efficient meetings.
  • Scrum Management: Manage scrum processes, including reaching out for standup inputs and managing standup report meetings.
  • Conversational Intelligence: Engage intelligently in rooms with humans and AIs, initiating and contributing to conversations.

Prerequisites

FrameVR Agent SDK

Currently, building an agent for FrameVR requires some complex authentication juggling with Firebase and Livekit. We propose an agent SDK that contains the minimal code for authenticating and connecting an agent to a FrameVR session. This enables any AI agent developer to connect their own AI agent built in Langchain or other tools and interact fully with the world without needing access to the entire FrameVR source code repo or learn about the underlying APIs they can focus entirely on building agents.

The SDK can be internal, but is distributed separate from the FrameVR codebase to enable partnerships with a range of AI agent companies and research groups where FrameVR handles the live 3D multiplayer environment.

bgent with Firebase Adapter

We built bgent because there was not a good "just works" AI agent package that would serve the needs of embodied 3D chat agents effectively, especially not in Typescript/Javascript. Bgent is case-specific built for 3D embodied AI agent projects, building off our experience with XREngine, Webaverse, MagickML and Upstreet. It has nearly complete test coverage and has been battle tested in several other projects.

bgent is being integrated with Firebase now, and can use RAG with Firebase via Google Vertex AI extension.

Model hosting and completion formatting

We recommend using a model router like OpenRouter or Mars Router and using a JSON format that doesn't require function calling. Many AI agent tasks and conversations can be completed with a 7B param model that costs a fraction of ChatGPT. We want to be able to seamlessly allow users to choose low end, mid-tier and high end cost options for their agent. Alternatively we could build a simple model router using Together.xyz and OpenAI/Claude.

Agent Applications

  1. Virtual Assistant: A user creates an AI agent to act as their virtual assistant. They set up the agent with access to their calendar, email, and other relevant information. The user then uses the agent to manage their schedule, set reminders, and handle various tasks, all within the FrameVR environment.

  2. Meeting Facilitator: A team leader creates an AI agent to facilitate their team meetings in FrameVR. The agent is programmed with the meeting agenda and helps guide the discussion, ensure all topics are covered, and transcribe and summarize the meeting for later reference.

  3. Virtual Tour Guide: A museum curator creates an AI agent to serve as a virtual tour guide for their museum's FrameVR exhibit. The agent is equipped with knowledge about the exhibits and can answer visitor questions, provide additional information, and guide visitors through the virtual space.

  4. Language Tutor: A language teacher creates an AI agent to serve as a conversation partner for their students in FrameVR. The agent is programmed with the ability to converse in the target language, correct grammar and pronunciation, and provide real-time feedback to help students improve their language skills.

Development Roadmap

Proof of Concept

We will create a proof of concept which demonstrates an external AI agent connecting to room, authenticating successfully, intercepting voice and chat streams and demonstrating persistent conversational memory.

Month 1

  • Develop the FrameVR Agent SDK
  • Agent prototype: connects to room, bi-directional voice with TTS/STT, persistent memory

Deliverable: Basic "hello world" agent with RAG memory and bidirectional voice, running locally

Month 2

  • Implement basic agent console features (Chat, Configuration, Task Management, Teleportation / Homeworld)
  • Enhance agent skills and capabilities (Navigation, Animation, Information Access)
  • Implement billing and quality settings
  • Implement hosted agents

Deliverable: Hosted agent which can be set up and deployed from FrameVR browser interface, with billing and configuration

Month 3

  • Refine agent autonomy and proactive interaction
  • Develop advanced skills (Meeting Transcription, Calendar Control, Scrum Management)
  • Prepare documentation and tutorials for agent development and adding new skills/functionality
  • Integrate with Discord
Select a repo