Webaverse AI Architecture

The following covers all of the AI systems that are running and required for Webaverse to run properly.

Domains

Text Generation
– Dialog
– Quests
– Descriptions
– Lore
– Ongoing Story
Image Generation
– Image previews
Voice Generation
– Character voices
Animation Generation
– Character animations
– Mob animations
– Pet animations
Model Generation
– Weapons
– Consumables
– Wearables
– Mobs
– Pets
– Vehicles
– Characters
Music Generation
– In-game Music
– Character theme songs
– Musical objects
Sound Generation
– Weapon sound FX
– Mob sound FX
– Pet sound FX
– Vehicle sound FX

Models

Search

Used for searching and doing ANN matching on anything, especially searching the Wikipedia or searching through a corpus for the closest match of something

STATUS: DEPLOYED
REPO: https://github.com/webaverse/weaviate-server

Current Implementation

Weaviate - https://weaviate.io/

Future Implementations

https://qdrant.tech/benchmarks/single-node-speed-benchmark/ – much faster than weaviate
https://github.com/facebookresearch/faiss – in development for a long time from facebook, as many compelling features, getting baked into Blenderbot

Text Generation

Used for generated all text and completions. Needs to be general purpose and fine tunable, able to handle long bodies of text and complex situations. Fast response time is also helpful.

STATUS: DEPLOYED

Current Implementation

GPT-3

Great in every way but expensive and not open source, many things we can't do with it

Future Implementation Candidates

OPT-175b

https://opt.alpa.ai/ / https://alpa.ai/tutorials/opt_serving.html
Comparable to early GPT-3 pre-instruct and good quality when finetuned
Used in Blenderbot – https://blenderbot.ai/

Unified Language Model

https://github.com/microsoft/unilm

Emotion Recognition and Toxicity Detection

Used for rapidly determining sentiment, emotion and hate in text

STATUS: NOT DEPLOYED

Future Implementation Candidates

XtremeDistil trained on GoEmotion dataset

https://huggingface.co/bergum/xtremedistil-l6-h384-go-emotion
Fast enough to run in the browser

Distilbert Toxicity detection

https://huggingface.co/dapang/distilbert-base-uncased-finetuned-toxicity

Voice Generation

Used for all character voice generation. Needs to be super fast and human sounding

STATUS: DEPLOYED
REPO: https://github.com/webaverse/tiktalknet

Current Implementation

TikTalkNet - https://github.com/webaverse/tiktalknet

Super fast and lightweight but not SOTA

Future Implementation Candidates

Image Generation

Used for generating character portraits, backgrounds, objects, textures and in-game artwork

STATUS: DEPLOYED
REPO: https://github.com/webaverse/stable-diffusion-webui
DEPRECATED: https://github.com/webaverse/stable-diffusion

Future Implementation Candidates

Kali Yuga is making great stuff, especially re: 8bit, pixel art and such, but it's all based on k-diffusion and Disco Diffusion
https://github.com/KaliYuga-ai/Pixel-Art-Diffusion/blob/main/Pixel_Art_Diffusion_v3_0_(With_Disco_Symmetry).ipynb
https://colab.research.google.com/drive/1ANvbcAI20-B-HXk5I0JwpRQvXPALBqtJ

Sound Generation

Used for ambient sounds in the world, as well as sound effects attached to objects and mobs

STATUS: DEPLOYEDREPO: https://github.com/webaverse/diffsound

Current Implementation

DiffSound

https://github.com/yangdongchao/Text-to-sound-Synthesis

Future Implementation Candidates

Audio Diffusion – similar to DiffSound but samples are much better, probably due to datasets

Music Generation (Audio)

Used for all music generated in Webaverse

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/archinetai/audio-diffusion-pytorch

This version of Audio Diffusion features fine-tuned models on specific pieces

Music Generation (MIDI)

Used for ambient audio generation in Webaverse. May be much faster to generate and process consistent long pieces than audio-only methods.

STATUS: NOT DEPLOYED

Future Implementation Candidates

Text to 3D

Used for generation of all 3D objects and features in the world, based on descriptions, images or general class types

STATUS: DEPLOYED
REPO: https://github.com/webaverse/stable-dreamfusion

Current Implementation

Stable Dreamfusion - https://github.com/ashawkey/stable-dreamfusion

SOTA, uses SD
Slow (50 minutes/generation) and quality isn't great

GET3D - https://github.com/nv-tlabs/GET3D

Relatively fast, ~ 1 min / model
Needs research on conditional generation

Future Implementation Candidates

https://nv-tlabs.github.io/LION/ - not released yet

Text to Motion

Used for generation of humanoid animations

STATUS: DEPLOYED
REPO: https://github.com/webaverse/motion-diffusion-model

Current Implementation

Motion Diffusion

https://github.com/webaverse/motion-diffusion-model

Other Implementations

https://github.com/mingyuan-zhang/MotionDiffuse - Seems very similar

Image to Text & Visual Question Answering

Used for describing images so that the game can incorporate user images into the story, analyze screenshots, generate labels for training data or prompts for inverted generation

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/salesforce/BLIP - works really well for prompt-like captions, also does visual question answering
https://github.com/webaverse/CLIP-Caption-Reward - detailed descriptions
https://github.com/pharmapsychotic/clip-interrogator - does a really good job giving back prompts
https://huggingface.co/dandelin/vilt-b32-finetuned-vqa

Audio to Text

Used for captioning or describing audio or sounds

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/TheoCoombes/ClipCap - uses CLAP from LAION to do many things, including captioning audio and audio2img

2D Image Animation

Generate animation from 2D images, especially synced with audio or text for characters and portraits

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/yoyo-nb/Thin-Plate-Spline-Motion-Model

Model Rigging

Add bones to objects, characters, mobs and pets that don't have a rig

STATUS: NOT DEPLOYED

Future Implementation Candidates

https://github.com/zhan-xu/RigNet

Datasets

3d models

Human models

https://github.com/open-mmlab/mmhuman3d/tree/main/configs/gta_human

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.