Webaverse AI Architecture

The following covers all of the AI systems that are running and required for Webaverse to run properly.


Domains

  • Text Generation
    Dialog
    Quests
    Descriptions
    Lore
    Ongoing Story
  • Image Generation
    Image previews
  • Voice Generation
    Character voices
  • Animation Generation
    Character animations
    Mob animations
    Pet animations
  • Model Generation
    Weapons
    Consumables
    Wearables
    Mobs
    Pets
    Vehicles
    Characters
  • Music Generation
    In-game Music
    Character theme songs
    Musical objects
  • Sound Generation
    Weapon sound FX
    Mob sound FX
    Pet sound FX
    Vehicle sound FX

Models

Used for searching and doing ANN matching on anything, especially searching the Wikipedia or searching through a corpus for the closest match of something

STATUS: DEPLOYED
REPO: https://github.com/webaverse/weaviate-server

Current Implementation

Weaviate - https://weaviate.io/

Future Implementations

Text Generation

  • Used for generated all text and completions. Needs to be general purpose and fine tunable, able to handle long bodies of text and complex situations. Fast response time is also helpful.

STATUS: DEPLOYED

Current Implementation

GPT-3

  • Great in every way but expensive and not open source, many things we can't do with it

Future Implementation Candidates

OPT-175b

Unified Language Model

Emotion Recognition and Toxicity Detection

Used for rapidly determining sentiment, emotion and hate in text

STATUS: NOT DEPLOYED

Future Implementation Candidates

XtremeDistil trained on GoEmotion dataset

Distilbert Toxicity detection

Voice Generation

Used for all character voice generation. Needs to be super fast and human sounding

STATUS: DEPLOYED
REPO: https://github.com/webaverse/tiktalknet

Current Implementation

TikTalkNet - https://github.com/webaverse/tiktalknet

  • Super fast and lightweight but not SOTA

Future Implementation Candidates

Image Generation

Used for generating character portraits, backgrounds, objects, textures and in-game artwork

STATUS: DEPLOYED
REPO: https://github.com/webaverse/stable-diffusion-webui
DEPRECATED: https://github.com/webaverse/stable-diffusion

Future Implementation Candidates

Sound Generation

Used for ambient sounds in the world, as well as sound effects attached to objects and mobs

STATUS: DEPLOYEDREPO: https://github.com/webaverse/diffsound

Current Implementation

DiffSound

Future Implementation Candidates

Audio Diffusion similar to DiffSound but samples are much better, probably due to datasets

Music Generation (Audio)

Used for all music generated in Webaverse

STATUS: NOT DEPLOYED

Future Implementation Candidates

This version of Audio Diffusion features fine-tuned models on specific pieces

Music Generation (MIDI)

Used for ambient audio generation in Webaverse. May be much faster to generate and process consistent long pieces than audio-only methods.

STATUS: NOT DEPLOYED

Future Implementation Candidates

Text to 3D

Used for generation of all 3D objects and features in the world, based on descriptions, images or general class types

STATUS: DEPLOYED
REPO: https://github.com/webaverse/stable-dreamfusion

Current Implementation

Stable Dreamfusion - https://github.com/ashawkey/stable-dreamfusion

  • SOTA, uses SD
  • Slow (50 minutes/generation) and quality isn't great

GET3D - https://github.com/nv-tlabs/GET3D

  • Relatively fast, ~ 1 min / model
  • Needs research on conditional generation

Future Implementation Candidates

https://nv-tlabs.github.io/LION/ - not released yet

Text to Motion

Used for generation of humanoid animations

STATUS: DEPLOYED
REPO: https://github.com/webaverse/motion-diffusion-model

Current Implementation

Motion Diffusion

Other Implementations

https://github.com/mingyuan-zhang/MotionDiffuse - Seems very similar

Image to Text & Visual Question Answering

Used for describing images so that the game can incorporate user images into the story, analyze screenshots, generate labels for training data or prompts for inverted generation

STATUS: NOT DEPLOYED

Future Implementation Candidates

Audio to Text

Used for captioning or describing audio or sounds

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/TheoCoombes/ClipCap - uses CLAP from LAION to do many things, including captioning audio and audio2img

2D Image Animation

Generate animation from 2D images, especially synced with audio or text for characters and portraits

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model

Model Rigging

Add bones to objects, characters, mobs and pets that don't have a rig

STATUS: NOT DEPLOYED

Future Implementation Candidates


Datasets

3d models

Human models

Select a repo